Ergebnisse der Mathematik und ihrer Grenzgebiete, 3. Folge 
A Series of Modern Surveys in Mathematics 60 


Michel Talagrand 


Upper and 
Lower Bounds 
for Stochastic 
Processes 


g) Springer 


Ergebnisse der Mathematik und ihrer 
Grenzgebiete. 3. Folge / A Series of Modern 
Surveys in Mathematics 


Volume 60 


Series Editors 


L. Ambrosio, Pisa 

V. Baladi, Paris 

G.-M. Greuel, Kaiserslautern 
M. Gromov, Bures-sur- Y vette 
G. Huisken, Tiibingen 

J. Jost, Leipzig 

J. Kollar, Princeton 

G. Laumon, Orsay 

U. Tillmann, Oxford 

J. Tits, Paris 

D. B. Zagier, Bonn 


Ergebnisse der Mathematik und ihrer Grenzgebiete, now in its third sequence, aims 
to provide summary reports, on a high level, on important topics of mathematical 
research. Each book is designed as a reliable reference covering a significant area 
of advanced mathematics, spelling out related open questions, and incorporating a 
comprehensive, up-to-date bibliography. 


More information about this series at http://www.springer.com/series/728 


Michel Talagrand 


Upper and Lower Bounds 
for Stochastic Processes 


Decomposition Theorems 


Second Edition 


G) Springer 


Michel Talagrand 
Paris, France 


ISSN 0071-1136 ISSN 2197-5655 (electronic) 

Ergebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge / A Series of Modern Surveys 
in Mathematics 

ISBN 978-3-030-82594-2 ISBN 978-3-030-82595-9 (eBook) 
https://doi.org/10.1007/978-3-030-82595-9 


Mathematics Subject Classification: 60G17, 60G15, 60E07, 46B09 


© Springer Nature Switzerland AG 2021 

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of 
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, 
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information 
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology 
now known or hereafter developed. 

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication 
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant 
protective laws and regulations and therefore free for general use. 

The publisher, the authors, and the editors are safe to assume that the advice and information in this book 
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or 
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any 
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional 
claims in published maps and institutional affiliations. 


This Springer imprint is published by the registered company Springer Nature Switzerland AG. 
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland 


Dedicated to the memory of Xavier Fernique. 


Xavier Fernique (1934-2020), by Pierre Fernique 


Preface 


This book had a previous edition [132]. The changes between the two editions are 
not only cosmetic or pedagogical, and the degree of improvement in the mathemat- 
ics themselves is almost embarrassing at times. Besides significant simplifications in 
the arguments, several of the main conjectures of [132] have been solved and a new 
direction came to fruition. It would have been more appropriate to publish this text 
as a brand new book, but the improvements occurred gradually and the bureaucratic 
constraints of the editor did not allow a change at a late stage without further delay 
and uncertainty. 

We first explain in broad terms the contents of this book, and then we detail some 
of the changes from [132]. 

What is the maximum level a certain river is likely to reach over the next 25 
years? What is the likely magnitude of the strongest earthquake to occur during 
the life of a planned nuclear plant? These fundamental practical questions have 
motivated (arguably also fundamental) mathematics, some of which are the object 
of this book. The value X; of the quantity of interest at time ¢ is modeled by a 
random variable. What can be said about the maximum value of X; over a certain 
range of t? How can we guarantee that, with probability close to one, this maximum 
will not exceed a given threshold? 

A collection of random variables (X;);e7, where t belongs to a certain index 
set T, is called a stochastic process, and the topic of this book is the study of the 
suprema of certain stochastic processes, and more precisely the search for upper and 
lower bounds for these suprema. The keyword of the book is 


INEQUALITIES. 


The “classical theory of processes” deals mostly with the case where T is a subset 
of the real line or of R”. We do not focus on that case, and the book does not really 
expand on the most basic and robust results which are important in this situation. 
Our most important index sets are “high-dimensional”: the large sets of data which 
are currently the focus of so much attention consist of data which usually depend on 
many parameters. Our specific goal is to demonstrate the impact and the range of 
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modern abstract methods, in particular through their treatment of several classical 
questions which are not accessible to “classical methods.” 

Andrey Kolmogorov invented the most important idea to bound stochastic 
processes: chaining. This wonderfully efficient method answers with little effort 
a number of basic questions but fails to provide a complete understanding, even in 
natural situations. This is best discussed in the case of Gaussian processes, where the 
family (X;)+er consists of centered jointly Gaussian random variables (r.v.s). These 
are arguably the most important of all. A Gaussian process defines in a canonical 
manner a distance d on its index set T by the formula 


d(s,t) = (E(X; — X;)*)'" . (0.1) 


Probably the single most important conceptual progress about Gaussian processes 
was the gradual realization that the metric space (7T,d) is the key object to 
understand them, even if JT happens to be an interval of the real line. This led 
Richard Dudley to develop in 1967 an abstract version of Kolmgorov’s chaining 
argument adapted to this situation. The resulting very efficient bound for Gaussian 
processes is unfortunately not always tight. Roughly speaking, “there sometimes 
remains a parasitic logarithmic factor in the estimates”. 

The discovery around 1985 (by Xavier Fernique and the author) of a precise (and 
in a sense, exact) relationship between the “size” of a Gaussian process and the 
“size” of this metric space provided the missing understanding in the case of these 
processes. Attempts to extend this result to other processes spanned a body of work 
that forms the core of this book. 

A significant part of the book is devoted to situations where skill is required to 
“remove the last parasitic logarithm in the estimates”. These situations occur with 
unexpected frequency in all kinds of problems. A particularly striking example is 
as follows. Consider n? independent uniform random points (X;);—,2, which are 
uniformly distributed in the unit square [0, 1]*. How far is a typical sample from 
being very uniformly spread on the unit square? To measure this we construct a 
one-to-one map z from {1,..., n?} to the vertices vj,...,u,2 of a uniformn x n 
grid in the unit square. If we try to minimize the average distance between X; and 
Uz(7), we can do as well as about ./logn/n but no better. If we try to minimize the 
maximum distance between X; and vz(;), we can do as well as about (log n)>/4 /n 
but no better. The factor 1/n is just due to scaling, but the fractional powers of login 
require a surprising amount of work. 

The book is largely self-contained, but it mostly deals with rather subtle questions 
such as the previous one. It also devotes considerable energy to the problem of 
finding lower bounds for certain processes, a topic far more difficult and less 
developed than the search for upper bounds. Even though some of the main ideas 
of at least Chap. 2 could (and should!) be taught at an elementary level, this is an 
advanced text. 

This book is in a sense a continuation of the monograph [53], or at least of part of 
it. I made no attempt to cover again all the relevant material of [53], but familiarity 
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with [53] is certainly not a prerequisite and maybe not even helpful. The way certain 
results are presented there is arguably obsolete, and, more importantly, many of the 
problems considered in [53] (in particular, limit theorems) are no longer the focus 
of much interest. 

One of my main goals is to communicate as much as possible of my experience 
from working on stochastic processes, and I have covered most of my results in this 
area. A number of these results were proved many years ago. I still like them, but 
some seem to be waiting for their first reader. The odds of these results meeting 
this first reader while staying buried in the original papers seemed nil, but they 
might increase in the present book form. In order to present a somewhat coherent 
body of work, I have also included rather recent results by others in the same 
general direction.! I find these results deep and very beautiful. They are sometimes 
difficult to access for the non-specialist. Explaining them here in a unified and often 
simplified presentation could serve a useful purpose. Still, the choice of topics is 
highly personal and does not represent a systematic effort to cover all the important 
directions. I can only hope that the book contains enough state-of-art knowledge 
about sufficiently many fundamental questions to be useful. 

Let me now try to outline the progress since the previous edition.? While 
attempting to explain better my results to others, I ended up understanding them 
much better myself. The material of the previous edition was reduced by about 
100 pages due to better proofs.*> More importantly, reexamination of the material 
resulted in new methods, and a new direction came to fruition, that of 


DECOMPOSITION THEOREMS. 


The basic idea is that there are two fundamentally different ways to control the size 
of asum )>;—, Xi. One may take advantage of cancellations between terms, or one 
may bound the sum by the sum of the absolute values. One may also interpolate 
between the two methods, which in that case means writing a decomposition X; = 
X; + X" and controlling the size of the sum });_y X; by taking advantage of the 
cancellations between terms, but controlling the sum }°;_ X/’ by the sum of the 
absolute values. The same schoolboy idea, in the setting of stochastic processes, is 
that a process can be bounded on the one hand using chaining, and on the other 
hand can often be bounded by cruder methods, involving replacing certain sums by 
the sums of the absolute values. The amazing fact is that many processes can be 
controlled by interpolating between these two methods, that is can be decomposed 
into the sum of two pieces, each of which can be controlled by one of these methods. 


' With one single exception I did not include results by others proved after the first edition of this 
book. 

> A precise comparison between the two editions may be found in Appendix G. 

3 A limited quantity of material of secondary importance was also removed. The current edition 
is not shorter than the previous one because many details have been added, as well as an entire 
chapter on the new results, and a sketch of proof for many exercises. 
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Such is the nature of the landmark result of Bednorz and Latata [16], the proof of the 
Bernoulli conjecture, which is the towering result of this book. Several conjectures 
of [132] in the same general directions have been solved* concerning in particular 
empirical processes and random series of functions. 

Despite the considerable progress represented by the solution of these con- 
jectures, a number of seemingly important questions remain open, and one of 
my main goals is to popularize these. Opinions differ as to what constitutes a 
really important problem, but I like those I explain here because they deal with 
fundamental structures. These problems might be challenging. At least, I tried my 
best to make progress on them, but they have seen little progress and received little 
attention. 

I would like to express my infinite gratitude to Shahar Mendelson. While he 
was donating his time to help another of my projects, it became clear through 
our interactions that, while I had produced great efforts toward the quality of the 
mathematics contained in my books, I certainly had not put enough efforts into 
the exposition of this material. I concluded that there should be real room for 
improvement in the text of [132], and this is why I started to revise it, and this 
led to the major advances presented here. 

While preparing the current text I have been helped by a number of people. I 
would like to thank some of them here (and to apologize to all those whom I do not 
mention). Ramon van Handel suggested a few almost embarrassing simplifications. 
Hengrui Luo and Zhenyuan Zhang suggested literately hundreds of improvements, 
and Rafat Meller’s comments had a great impact too. Further luck had it that, almost 
at the last minute, my text attracted the attention of Kevin Tanguy whose efforts 
resulted in a higher level of detail and a gentler pace of exposition. In particular, his 
and Zhang’s efforts gave me the energy to make a fresh attempt at explaining and 
detailing the proof of the Bernoulli conjecture obtained by Bednorz and Latata in 
[16]. This proof is the most stupendously beautiful piece of mathematics I have met 
in my entire life. I wish the power of this result and the beauty of this proof become 
better understood. 

I dedicate this work to the memory of Xavier Fernique. Fernique was a 
deeply original thinker. His groundbreaking contributions to the theory of Gaussian 
processes were viewed as exotic by mainstream probabilists, and he never got the 
recognition he deserved. I owe a great debt to Fernique: it is his work on Gaussian 
processes which made my own work possible, first on Gaussian processes, and then 
on all the situations beyond this case. This work occupied many of my most fruitful 
years. A large part of it is presented in the present volume. It would not have existed 
without Fernique’s breakthroughs. 


Paris, France Michel Talagrand 


4 After another crucial contribution of Bednorz and Martynek [18]. 
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Chapter 1 ® 
What Is This Book About? Ghost for 


This short chapter describes the philosophy underlying this book and some of its 
highlights. This description, often using words rather than formulas, is necessarily 
imprecise and is only intended to provide some insight into our point of view. 


1.1 Philosophy 


The practitioner of stochastic processes is likely to be struggling at any given time 
with his favorite model of the moment, a model which typically involves a rich 
and complicated structure. There is a near infinite supply of such models. The 
importance with which we view any one of them is likely to vary over time. 

The first advice I received from my advisor Gustave Choquet was as follows: 
always consider a problem under the minimum structure in which it makes sense. 
This advice has literally shaped my mathematical life. It will probably be as fruitful 
in the future as it has been in the past. By following it, one is naturally led to study 
problems with a kind of minimal and intrinsic structure. Not so many structures are 
really basic, and one may hope that these will remain of interest for a very long time. 
This book is devoted to the study of such structures which arise when one tries to 
estimate the suprema of stochastic processes. 

The feeling, real or imaginary, that one is studying objects of intrinsic importance 
is enjoyable, but the success of the approach of studying “minimal structures” has 
ultimately to be judged by its results. As we shall demonstrate, the tools arising from 
this approach provide the final words on a number of classical problems. 
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2 1 What Is This Book About? 
1.2. What Is Chaining? 


A stochastic process is a collection of random variables (r.v.s) (X;);e7 indexed by 
a set T. To study it, Kolmogorov invented chaining, the main tool of this book. 
The fundamental idea of chaining is to replace the index set T by a sequence of 
finite approximations 7,, and to study the r.v.s X; through successive approximations 
Xx,(t) where p(t) € Tn. The first approximation consists of a single point f9 so 
To = {to}. The fundamental relation is then 


Xt — Xig = ) (Xan) — Xm-1) « (1.1) 


n>1 


When T is finite, the only case we really need, the sum on the right is finite. This 
relation gives its name to the method: the chain of approximations 7, (t) links fo and 
t. To control the differences X; — Xj), it suffices then to control all the differences 
|Xx,(t) — Xa, (l- 


1.3. The Kolmogorov Conditions 


Kolmogorov stated the “Kolmogorov conditions”, which robustly ensure the good 
behavior of a stochastic process indexed by a subset of R”. These conditions are 
studied in any advanced probability course. If you have taken such a course, this 
section will refresh your memory about these conditions, and the next few sections 
will present the natural generalization of the chaining method in an abstract metric 
space, as it was understood in, say, 1970. Learning in detail about these historical 
developments now makes sense only if you have already heard of them, because the 
modern chaining method, which is presented in Chap. 2, is in a sense far simpler 
than the classical method. For this reason, the material up to Sect. 1.4 included is 
directed toward a reader who is already fluent in probability theory. If, on the other 
hand, you have never heard of these things and if you find this material too difficult, 
you should start directly with Chap. 2, which is written at a far greater level of detail 
and assumes minimal familiarity with even basic probability theory. 

We say that a process (X;)rer7, where T = [0,1], satisfies the Kolmogorov 
conditions if 


Vs,t €[0,1]", ElX,—X,|? <d(s, 1)”. (1.2) 


where d(s, t) denotes the Euclidean distance and p > 0,a@ > m. Here E denotes 
mathematical expectation. In our notation, the operator E applies to whatever 
expression is placed behind it, so that E|Y |? stands for E(|Y|?) and not for (E|Y|)?. 
This convention is in force throughout the book. 
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Let us apply the idea of chaining to processes satisfying the Kolmogorov 
conditions. The most obvious candidate for the approximating set 7, is the set Gp 
of points x in [0, 1[” such that the coordinates of 2”x are positive integers.' Thus, 
cardG, = 2”. It is completely natural to choose m,(u) € Gy as close to u as 
possible, so that d(u, m,(u)) < ./m2~" and d(m,(u), Mn_-1(u)) < d(aty(u), u) + 
d(u, Tn—1(u)) < 3/m2™". 

For n > 1, let us then define 


Un = {(s,01)3 8 € Gn, t€ Gy, d(s,t) <3fm2"}. (1.3) 
Given s = (S1,...,5m) € Gy, the number of points t = (f),...,tn) € Gn with 
d(s,t) < 3,/m2~" is bounded independently of s and n because |t; — s;| < d(s, t) 
for each i < m, so that we have the crucial property 


card Un < K(m)2”” , (1.4) 


where K (m) denotes a number depending only on m, which need not be the same 
on each occurrence. Consider then the r.v. 


Y, = max{|Xs — Xr|; (s,t) € Un} , (1.5) 

so that (and since G,_; C G,,) for each u, 
|Xstn(u) — Xap_1(u)| S Yn - (1.6) 
To avoid having to explain what is “a version of the process” and since we care only 


about inequalities, we will consider only the r.v.s X; fort € G =: U,39 Gn. We 
first claim that 


sup IXs—-Xi1<3) 0%. (1.7) 


s,teG ; d(s,t)<2-* n>k 


To prove this, consider n > k such that s,t € Gy, so that s = m,(s) andt = 
,(t). Assuming d(s,t) < 2-*, we have 


d(se(s), me (t)) < d(s, me(s)) + d(s, t) + d(t, m1) < 3m2~* , 
so that (s,(s), me (t)) € Ux and thus 


IXayp(s) — Xay(t)| < Ye - 


' There is no other reason for using the points x in [0, 1[” such that the coordinates of 2”x are 
positive integers rather than the points x in [0, 1]’” with the same property than the fact that there 
are 2””” such points rather than the typographically unpleasant number (2” + 1)’". 


4 1 What Is This Book About? 
Next, for u € {s, t}, 


Xu — Xiu) = Xanw) — Xmw = YO Xai — Xa » 


k<€<n 


and since |Xz,,4(w) — Xxp(u)| < Yeri, we obtain |X, — Xz,(u)| < eek Ye41. To 
obtain (1.7), we then use the previous inequalities and the identity 


Xs — Xp = Xs — Xay(s) + Xay(s) — Xap) + Xap) — Xt - 


Let us now draw some consequences of (1.7). For a finite family of numbers 
V; > 0, we have 


(max Vi)? < pe vp, (1.8) 
L 
and thus 


EyrP<E )° [X;—X|? <K(m,a)2"™™, 
(s,teU, 


since E|X, — X;|? < K(m,a)2~" for (s,t) € U, by (1.2) and using (1.4). To 
proceed, one needs to distinguish whether or not p > |. For specificity, we assume 
p = 1. Since, as we just proved, ||Yn||p := (E|¥p|?)'/? < K(m, p, a)2"™"-%/P, 
the triangle inequality in L? yields” 


| do Yall, < DOK Om, parr"? < Km, p, a) 2k? (1.9) 


n>k n>k 


Combining with (1.7), we then obtain 


| 
s,teG;d(s,t)<2-* 


< K(m, p, «)2k"—/P | (1.10) 
Pp 


a sharp inequality from which it is then simple to prove (with some loss of 
sharpness) results such as the fact that forO < 6 < a —™m, one has 


X; — X;,|? 
E sup | S 1 


1.11 
s,teG d(s, t)P ; 


2 There of course the two occurrences of the constant K (m, Pp, @) are not the same. 


1.3. The Kolmogorov Conditions 5 
Exercise 1.3.1 Prove (1.11). Hint: Prove that 


E sup De = XP Sees (1.12) 
k>0 s,teG;d(s,t)<2-* 


Thus, chaining not only proves that the process (X;);e7 has a continuous version; 
it also provides the very good estimate (1.10). One reason for which everything is 
so easy in this case is that the size of the terms X;,,, (uw) — Xx,(u) decreases like a 
geometric series. 

Let us then pause for a moment and reflect on what we have been doing. 


e The Euclidean metric structure of T is not really intrinsic to the problem. Far 
more intrinsic is the (quasi) distance on T given by 4(s,t) = ||Xs — X;\lp. The 
condition (1.2), which we may now write as 5(s, t) < d(s, t)%!?, simply enforces 
a kind of “smallness condition” on the metric space (T, 4). 

¢ The use of the bound (1.6) is rather pessimistic, as it bounds each of the 
increments along the chain by the worst possible case among each increment. 


These two remarks contain in germ much of the future progress we will make. 
Following the first remark, we will learn, starting with the next section, to look at 
problems in a more intrinsic manner. And our sharp chaining methods will avoid the 
crude bound of each increment by the worst possible case. 

There are many variations on the previous ideas. The next two exercises explore 
one. 


Exercise 1.3.2 Consider a convex function g > 0 with g(0) = 0. Prove that for 
r.v.s V; > 0 one has 


Emax V; < ¢~'( )“Eg(Vi)) . (1.13) 


Exercise 1.3.3 Consider the function g as above, and consider positive numbers 
Cn, d,. Assume that the process (X;);e7 satisfies 


xX,-X 
Wn>0, Ws,t€T, d(s,t) <3/m2" = Ey(A—*") <d,. (1.14) 


Ch 


Prove that 


E sup.) [Xs — X:| <3) eng! (Kn) 2""dy) « (1.15) 


s,teG,d(s,t)<2-* n>k 


6 1 What Is This Book About? 


The series in (1.15) has no reason to converge like a geometric series, so we already 


are being more sophisticated than in the case of the Kolmogorov conditions.° 


1.4 Chaining in a Metric Space: Dudley’s Bound 


Suppose now that we want to study the uniform convergence on [0, 1] of a random 
Fourier series X; = Veet aggk Ccos(2mikt) where ag are numbers and (gx) are 
independent standard Gaussian r.v.s. The Euclidean structure of [0, 1] is not intrinsic 
to the problem. Far more relevant is the distance d given by 


d(s,t)? = E(X, — X;)* = 9 af (cosQimks) — cos(2imkt))? . (1.16) 
k 


This simple idea took a very long time to emerge. Once one thinks about the distance 
d, then in turn the fact that the index set T is [0, 1] is no longer very relevant 
because this particular structure does not connect very well with the distance d. 
One is then led to consider Gaussian processes indexed by an abstract set T.4 We 
say that (X;);er is a Gaussian process when the family (X;);er is jointly Gaussian 
and centered.° Then, just as in (1.16), the process induces a canonical distance d on 
T given by d(s,t) = (E(Xs — X,)7)!/2. We will express that Gaussian r.v.s have 
small tails by the inequality 


X,—X 
a) cy, (1.17) 


Vs,teT, Eo( FERS 
where v(x) = exp(x*/4) — 1. This inequality holds because if g is a standard 
Gaussian r.v., then E exp(g/4) < 2.° 

To perform chaining for such a process, in the absence of further structure on our 
metric space (7, d), how do we choose the approximating sets 7,,? Thinking back to 
the Kolmogorov conditions, it is very natural to introduce the following definition: 


3 In the left-hand side of (1.15), we would like to do better than controlling the expectation, but one 
really needs some regularity of the function g for this. It suffices here to say that when p(x) = |x|? 
for p > 1, we may replace the expectation by the norm of L”, proceeding exactly as we did in the 
case of the Kolmogorov conditions. 


4 Let us stress the point. Even though the index set is a subset of R’”, we have no chance to really 
understand the process unless we forget this irrelevant structure. 

5 Centered means that EX, = 0 for each t. 

6 Starting with the next chapter, we will control the r.v.s |X; — X;| through their tail properties, 
and (1.17) is just another way to present the same situation. 


1.4 Chaining in a Metric Space: Dudley’s Bound 7 


Definition 1.4.1 For « > 0, the covering number N(T,d,€) of a metric space 
(T, d) is the smallest integer N such that T can be covered by N balls of radius e.’ 


Equivalently, N (7, d, €) is the smallest number WN such that there exists aset V C T 
with card V < N and such that each point of T is within distance € of V. 

Let us denote by A(T) = sup, ,<7 d(s, t) the diameter of T and observe that 
N(T,d, A(T)) = 1. We construct our approximating sets 7, as follows: Consider 
the largest integer np with A(T) < 2~”°. For n > no, consider a set T, C T with 
card T, = N(T,d,2~") such that each point of T is within distance 2~” of a point 
of T,.° In particular To consists of a single point. 

We then perform the chaining as in the case of the Kolmogorov conditions, using 
for z(t) a point in 7, with d(t, m,(t)) < 2~”. Consider 


Un = {(5,0); 5, ET, , d(s,th <3-2™"}, 
so that 
card Uy < (card Tn)* = N(T,d,2~")° . 

This crude bound is hard to improve in general and should be compared to (1.4). We 
now apply (1.13) to the rv.s Vi = |Xs5 — X;|/(3-2~”) fori = (s,t) € Up. Since 
Eg(V;) < 1, we obtain that the rv. 

Yn = max{|X; — Xr]; (s,t) € Un} 
satisfies 

EY, 23-9 "e (Nir .a.2 *)). 


and exactly as in the case of the Kolmogorov conditions, we obtain 


E sup |X5—X;|<L) 2p 1 (N(T,d,2")) , 
d(s,t)<2-* n>k 


where L is a number (which may change between occurrences). We delay the 
exercise of writing this inequality in integral form as 


é 
E sup |X;—Xi|< al gy '(N(T, d, €)”)de . (1.18) 
d(s,t)<6 0 


7 Here our balls are closed balls. One could also use open balls in this definition. There seems to 
be no universal agreement about this. For our purpose, it makes no difference whatsoever. 

8 We do not require that T,, C T,,+1. In Sect. 1.3, it does happen that G, C G,+1, but this was not 
really used in the arguments. 


8 1 What Is This Book About? 


In the case of the function g(x) = exp(x7/4) — 1, so that go '(x) — 
2,/log(1 + x), inequality (1.18) is easily shown to be equivalent to the following 
more elegant formulation: 


Theorem 1.4.2 (Dudley’s Bound) /f (X;);er is a Gaussian process with natural 
distance d, then 


5 
E sup |X; — X;| < xf Vlog N(T, d, e)de . (1.19) 
) 


d(s,t)<6 


This very general inequality is by far the most useful result on continuity of Gaussian 
processes. 


Exercise 1.4.3 Prove that the previous bound gives the correct uniform modulus of 
continuity for Brownian motion on [0, 1]: for 6 < 1, 


E sup |B; — B;| < Ldlog(2/6) . 


|s—t|<6 


The message of Chap. 2 is simple: 


¢ However useful, Dudley’s bound is not optimal in a number of fundamentally 
important situations. 

e It requires no more work to obtain a better bound which is optimal in every 
situation. 


1.5 Overall Plan of the Book 


A specific feature of the index set T = [0, 1]” (provided with the Euclidean dis- 
tance) occurring in the Kolmogorov conditions is that it is really “sm-dimensional” 
and “the same around each point’. This is not the case for index sets which occur in 
a great many natural situations. If one had to summarize in one sentence the content 
of the upper bounds presented in this book, it would be that they develop methods 
which are optimal even when this feature does not occur. 

The main tools are built in Parts I and I. Part I is devoted to the most important 
situation we consider in the book, the study of Gaussian processes, and we learn the 
basic concepts on how to measure the “size” of a metric space. The effectiveness 
of the corresponding tools is then demonstrated by proving classical results on 
matchings. 

The goal of Part II is to extend the results of the Gaussian case to other more 
general processes. This program of building the proper tools to go beyond the 
Gaussian case was started by the author soon after he obtained his results on 
Gaussian processes (which are presented in Chap. 2). It is a significant endeavor 
which requires a number of new concepts. The most important of these is the idea 
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Fig. 1.1 Dependence chart 
between chapters. Very 
marginal dependence is not 
indicated 


of families of distances. We can no longer entirely describe the situation using a 
single distance on the index set (as is the case for Gaussian processes). In some 
sense, this program has been completed. Most of the results which were dreamed 
by the author? between 1985 and 1990 are now proved in Chap. 11. 

Part III explores situations which belong to the same circle of ideas but in diverse 
directions. The dependence chart between the chapters is given in Fig. 1.1. 


1.6 Does This Book Contain any Ideas? 


At this stage, it is not really possible to precisely describe any of the new ideas 
which will be presented, but if the following statements are not crystal clear to you, 
you may have something to learn from this book: 


Idea 1 It is possible to organize chaining optimally using increasing sequences of 
partitions. 


° Including some which sounded like crazily optimistic conjectures! 


10 1 What Is This Book About? 


Idea 2 There is an automatic device to construct such sequences of partitions, using 
“functionals”, quantities which measure the size of the subsets of the index set. This 
yields a complete understanding of boundedness of Gaussian processes. 


Idea 3 Ellipsoids are much smaller than one would think, because they (and, more 
generally, sufficiently convex bodies) are thin around the edges. This explains the 
funny fractional powers of logarithms in certain matching theorems. 


Idea 4 One may witness that a metric space is large by the fact that it contains large 
trees or equivalently that it supports an extremely scattered probability measure. 


Idea 5 Consider a set T on which you are given a distance d and a random distance 
dw such that, given s,t € T, it is rare that the distance d,,(s, t) is much smaller 
than d(s,t). Then if in the appropriate sense (7, d) is large, it must be the case 
that (7, d,,) is typically large. This principle enormously constrains the structure of 
many bounded processes built on random series. 


Idea 6 There are different ways a random series might converge. It might converge 
because chaining witnesses that there is cancellation between terms, or it might 
converge because the sum of the absolute values of its terms already converges. 
Many processes built on random series can be split in two parts, each one converging 
according to one of the previous phenomena. 


The book contains many more ideas, but you will have to read more to discover 
them. 


1.7 Overview by Chapters 


1.7.1 Gaussian Processes and the Generic Chaining 


This subsection gives an overview of Chap. 2. More generally, Sect. 1.7.n gives the 
overview for Chapter n + 1. 

The most important question considered in this book is the boundedness of 
Gaussian processes. The key object is the metric space (T, d) where T is the index 
set and d the intrinsic distance (0.1). As investigated in Sect. 2.11, this metric space 
is far from being arbitrary: it is isometric to a subset of a Hilbert space. It is, however, 
a deadly trap to try to use this specific property of the metric space (7, d). The 
proper approach is to just think of it as a general metric space. 

After reviewing some elementary facts, in Sect. 2.4, we explain the basic idea of 
the “generic chaining”, one of the key ideas of this work. Chaining is a succession 
of steps that provide successive approximations of the index space (7, d). In the 
Kolmogorov chaining, for each n, the difference between the n-th and the (7 + 1)-th 
approximation of the process, which we call here “the variation of the process during 
the n-th chaining step”, is “controlled uniformly over all possible chains”. Generic 
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chaining allows that the variation of the process during the n-th chaining step “may 
depend on which chain we follow”. Once the argument is properly organized, it is 
not any more complicated than the classical argument. It is in fact exactly the same. 
Yet, while Dudley’s classical bound is not always sharp, the bound obtained through 
the generic chaining is optimal. Entropy numbers are reviewed in Sect. 2.5. 

It is technically convenient to formulate the generic chaining bound using special 
sequences of partitions of the metric space (T,d), that we shall call admissible 
sequences throughout the book. The key to make the generic chaining bound useful 
is then to be able to construct admissible sequences. These admissible sequences 
measure an aspect of the “size” of the metric space and are introduced in Sect. 2.7. 
In Sect. 2.8, we introduce another method to measure the “size” of the metric space, 
through the behavior of certain “functionals”, which are simply numbers attached 
to each subset of the entire space. The fundamental fact is that the two measures 
of the size of the metric space one obtains either through admissible sequences or 
through functionals are equivalent in full generality. This is proved in Sect. 2.8 for 
the easy part (that the admissible sequence approach provides a larger measure of 
size than the functional approach) and in Sect. 2.9 for the converse. This converse is, 
in effect, an algorithm to construct sequences of partitions in a metric space given a 
functional. Functionals are of considerable use throughout the book. 

In Sect.2.10, we prove that the generic bound can be reversed for Gaussian 
processes, therefore providing a characterization of their sample-boundedness. 
Generic chaining entirely explains the size of Gaussian processes, and the dream 
of Sect. 2.12 is that a similar situation will occur for many processes. 

In Sect. 2.11, we explain why a Gaussian process in a sense is nothing but a subset 
of Hilbert space. Remarkably, a number of basic questions remain unanswered, such 
as how to relate through geometry the size of a subset of Hilbert space seen as a 
Gaussian process with the corresponding size of its convex hull. 

Dudley’s bound fails to explain the size of the Gaussian processes indexed by 
ellipsoids in Hilbert space. This is investigated in Sect. 2.13. Ellipsoids will play a 
basic role in Chap. 4. 


1.7.2. Trees and Other Measures of Size 


We describe different notions of trees and show how one can measure the “size” of a 
metric space by the size of the largest trees it contains, in a way which is equivalent 
to the measures of size introduced in Chap. 2. This idea played an important part in 
the history of Gaussian processes. Its appeal is mostly that trees are easy to visualize. 
Building a large tree in a metric space is an efficient method to bound its size from 
below. We then learn a method of Fernique to measure the size of a metric space 
through certain properties of the probability measures on it. It will be amenable to 
vast generalizations. 
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1.7.3 Matching Theorems 


Chapter 4 makes the point that the generic chaining (or some equivalent form of it) is 
already required to really understand the irregularities occurring in the distribution 
of N points (X;);<x independently and uniformly distributed in the unit square. 
These irregularities are measured by the “cost” of pairing (=matching) these points 
with WN fixed points that are very uniformly spread, for various notions of cost. 

These optimal results involve mysterious powers of log N. We are able to trace 
them back to the geometry of ellipsoids in Hilbert space, so we start the chapter with 
an investigation of these ellipsoids in Sect. 4.1. The philosophy of the main result, 
the ellipsoid theorem, is that an ellipsoid is in some sense somewhat smaller than it 
appears at first. This is due to convexity: an ellipsoid gets “thinner” when one gets 
away from its center. The ellipsoid theorem is a special case of a more general result 
(with the same proof) about the structure of sufficiently convex bodies, one that will 
have important applications in Chap. 19. 

In Sect. 4.3, we provide general background on matchings. In Sect. 4.5, we 
investigate the case where the cost of a matching is measured by the average distance 
between paired points. We prove the result of Ajtai, Koml6s and Tusnady that the 
expected cost of an optimal matching is at most L./log N//N where L is a number. 
The factor 1/./N is simply a scaling factor, but the fractional power of log is 
optimal as shown in Sect. 4.6. In Sect. 4.7, we investigate the case where the cost 
of a matching is measured instead by the maximal distance between paired points. 
We prove the theorem of Leighton and Shor that the expected cost of a matching is 
at most L(log N)3/4/./N, and the power of log is shown to be optimal in Sect. 4.8. 

With the exception of Sect. 4.1, the results of Chap. 4 are not connected to any 
subsequent material before Chap. 17. 


1.7.4. Warming Up with p-Stable Processes 


With this chapter, we start the program of vastly extending the results of Chap. 2 
concerning Gaussian processes. We outline several of the fruitful methods on the 
class of p-stable processes, based on their property of being conditionally Gaussian. 


1.7.5 Bernoulli Processes 


Random signs are obviously important r.v.s and occur frequently in connection 
with “symmetrization procedures”, a very useful tool. In a Bernoulli process, the 
individual random variables X; are linear combinations of independent random 
signs. Each Bernoulli process is associated with a Gaussian process in a canonical 
manner, when one replaces the random signs by independent standard Gaussian 
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r.v.s. The Bernoulli process has better tails than the corresponding Gaussian process 
(it is “sub-Gaussian’’) and is bounded whenever the corresponding Gaussian process 
is bounded. There is, however, a completely different reason for which a Bernoulli 
process might be bounded, namely, that the sum of the absolute values of the 
coefficients of the random signs remain bounded independently of the index t. A 
natural question is then to decide whether these two extreme situations are the only 
fundamental reasons why a Bernoulli process can be bounded, in the sense that a 
suitable “mixture” of them occurs in every bounded Bernoulli process. This was 
the “Bernoulli conjecture” (to be stated formally on page 179), which has been so 
brilliantly solved by W. Bednorz and R. Latata. 

It is a long road to the solution of the Bernoulli conjecture, and we start to build 
the main tools bearing on Bernoulli processes. A linear combination of independent 
random signs looks like a Gaussian r.v. when the coefficients of the random signs 
are small. We can expect that a Bernoulli process will look like a Gaussian process 
when these coefficients are suitably small. This is a fundamental idea: the key to 
understanding Bernoulli processes is to reduce to situations where these coefficients 
are small. 

The Bernoulli conjecture, on which the author worked so many years, greatly 
influenced the way he looked at various processes. In the case of empirical 
processes, this is explained in Sect. 6.8. 


1.7.6 Random Fourier Series and Trigonometric Sums 


The basic example of a random Fourier series is 


X, = >) & exp(2mikt) , (1.20) 
k>1 


where i2 = —1, wheret € [0, 1] and the r.v.s &% are independent symmetric. In this 
chapter, we provide a final answer to the question of the convergence of such series. 

The fundamental case where & = agg, for numbers ax and independent 
Gaussian r.v.s (gx) is of great historical importance. There is, however, another 
motivation for the study of such series. The generic chaining and related methods 
are well adapted to the case of a “nonhomogeneous index space”. The study of 
certain of the processes we will consider in the next chapters is already subtle even 
in the absence of the extra difficulty due to this lack of homogeneity. The setting of 
random Fourier series allows us to put aside the issue of lack of homogeneity and to 
concentrate on the other difficulties and played a great part in the development of the 
theory. It provides an ideal setting to understand a basic fact: many processes can be 
exactly controlled, not by using one or two distances, but by using an entire family 
of distances. This concept of “family of distances” will play a major role later. It is 
also while analyzing the lower bounds discovered in the setting of random Fourier 
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series that the author discovered the method which allows to extend these bounds 
to general random series as explained in Chap. 11. In this chapter, we also meet our 
first “decomposition theorem”: there are two distinct reasons which explain the size 
of a random trigonometric sum. First, there can be a lot of cancellation between 
the terms. Second, it may happen that the sum of the absolute values of the terms 
is already small. We show that every random trigonometric sum is the sum of two 
such pieces, one of each type. 


1.7.7 Partition Scheme for Families of Distances 


Once one has survived the initial surprise of the new idea that many processes are 
naturally associated with an entire family of distances, it is very pleasant to realize 
that the tools of Sect.2.9 can be extended to this setting with essentially the same 
proof. This is the purpose of Sect. 8.1. 

In Sect. 8.3, we apply these tools to the situation of “canonical processes” where 
the r.v.s X; are linear combinations of independent copies of symmetric r.v.s with 
density proportional to exp(—|x|*) where a > 1| (and to considerably more general 
situations as discovered by R. Latata). In these situations, the size of the process 
can be completely described from the geometry of the index space, a far-reaching 
extension of the Gaussian case. 


1.7.8  Peaky Parts of Functions 


We learn how to measure the size of sets of functions on a measured space using an 
appropriate family of distances. We show that when we control this size, for each 
function of the set, we can distinguish its “peaky part” in a coherent way over the 
whole set of functions which then has in a sense a simple structure, as it is built from 
simpler pieces. 


1.7.9 Proof of the Bernoulli Conjecture 


Having learned how to manipulate “families of distances”, we are now better 
prepared to prove the Bernoulli conjecture. This is the (overwhelmingly important) 
Latata-Bednorz theorem. The challenging proof occupies most of Chap. 10.!° In the 
last section, we investigate how to get lower bounds on Bernoulli processes using 
“witnessing measures”’. 


'0 Tt is a good research program to discover a more intuitive approach to this result. 
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1.7.10 Random Series of Functions 


For a large class of random series of functions, we prove in full generality that 
chaining explains all the part of the boundedness of these processes created by 
cancellations, in the spirit of the Bernoulli conjecture. This covers the cases both 
of empirical processes and of the closely related class of selector processes. Our 
main tool is to reduce to processes which are conditionally Bernoulli processes and 
to use the Latata-Bednorz theorem and its consequences. 


1.7.11 Infinitely Divisible Processes 


The infinitely divisible processes we study are indexed by a general set and are 
to Lévy processes what a general Gaussian process (index by an arbitrary index 
set) is to Brownian motion (a Gaussian process indexed by R with stationary 
increments). We extend to these processes our results on random series of functions: 
chaining explains all the part of the boundedness of these processes which is due to 
cancellations. The results are described in complete detail with all definitions in 
Sect. 12.3. 


1.7.12 Unfulfilled Dreams 


Having proved in several general settings that “chaining explains all the part of 
the boundedness which is due to cancellation”, we concentrate on the problem of 
describing the “part of the boundedness which owes nothing to cancellation”. We 
propose sweeping conjectures. The underlying hope behind these conjectures is that, 
ultimately, a bound for a selector process always arises from the use of the “union 
bound” P(U,; An) < eS P(A,,) in a simple situation, the use of basic principles 
such as linearity and positivity, or combinations of these. 


1.7.13 Empirical Processes 


We focus on a special yet fundamental topic: the control of the supremum of the 
empirical process over a class of functions. 

We demonstrate again the power of the chaining scheme of Sect. 9.4 by providing 
a sharper version of Ossiander’s bracketing theorem with a very simple proof. We 
then illustrate various techniques by presenting proofs of two deep recent results. 
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1.7.14 Gaussian Chaos 


Our satisfactory understanding of the properties of Gaussian processes should 
bring information about processes that are, in various senses, related to Gaussian 
processes. Such is the case of an order 2 Gaussian chaos (which is essentially a 
family of second-degree polynomials of Gaussian random variables). It seems at 
present a hopelessly difficult task to give lower and upper bounds of the same order 
for these processes, but in Sect. 15.1, we obtain a number of results in this direction. 
Chaos processes are very instructive because there exist other methods than chaining 
to control their size (a situation which we do not expect to occur for processes 
defined as sums of a random series). 

In Sect. 15.2, we study the tails of a single multiple-order Gaussian chaos and 
present (yet another) deep result of R. Latata which provides a rather complete 
description of the size of these tails. 


1.7.15 Convergence of Orthogonal Series: Majorizing 
Measures 


The old problem of characterizing the sequences (am) such that for each orthonor- 
mal sequence (@,) the series es An@m converges a.s. was solved by A. 
Paszkiewicz. Using a more abstract point of view, we present a very much simplified 
proof of his results (due essentially to W. Bednorz). This leads us to the question 
of discussing when a certain condition on the “increments” of a process implies 
its boundedness. When the increment condition is of “polynomial type’, this is 
more difficult than in the case of Gaussian processes and requires the notion of 
“majorizing measure”. We present several elegant results of this theory, in their 
seemingly final forms recently obtained by W. Bednorz. 


1.7.16 Shor’s Matching Theorem 


This chapter continues Chap.4. We prove a deep improvement of the Ajtai- 
Komlés-Tusnady theorem due to P. Shor. Unfortunately, due mostly to our lack of 
geometrical understanding, the best conceivable matching theorem, which would 
encompass this result as well as those of Chap.4, and much more, remains as a 
challenging problem, “the ultimate matching conjecture” (a conjecture which is 
solved in the next chapter in dimension >3). 
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1.7.17 The Ultimate Matching Theorem in Dimension Three 


In this case, which is easier than the case of dimension two (but still apparently 
rather non-trivial), we are able to obtain the seemingly final result about matchings, 
a strong version of “the ultimate matching conjecture”. There are no more fractional 
powers of log N here, but in a random sample of N points uniformly distributed in 
[0, 1]?, local irregularities occur at all scales between N7"/3 and (log Ny N13, 
and our result can be seen as a precise global description of these irregularities. 


1.7.18 Applications to Banach Space Theory 


Chapter 19 gives applications to Banach space theory. As interest in this theory 
has decreased in recent years, we have not reproduced many of the results of 
[132], and we urge the interested reader to consult this earlier edition. We have 
kept only the results which make direct use of results presented elsewhere in 
the book (rather than including results based on the methods of the book). In 
Sect. 19.1.2, we study the cotype of operators from ¢%? into a Banach space. In 
Sect. 19.1.3, we prove a comparison principle between Rademacher (=Bernoulli) 
and Gaussian averages of vectors in a finite-dimensional Banach space, and we use 
it to compute the Rademacher cotype-2 of a finite-dimensional space using only a 
few vectors. In Sect. 19.2.1 we discover how to classify the elements of the unit 
ball of L! “according to the size of the level sets”. In Sect. 19.2.3 we explain, given 
a 1-unconditional sequence (e;);<y in a Banach space E how to “compute” the 
quantity E|| °; gie;|| when g; are independent Gaussian r.v.s, a further variation 
on the fundamental theme of the interplay between the L', L? and L© norms. 
In Sect. 19.3.1 we study the norm of the restriction of an operator from ae to 
the subspace generated by a randomly chosen small proportion of the coordinate 
vectors, and in Sect. 19.3.2 we use these results to deduce the celebrated results of J. 
Bourgain on the A, problem. Recent results of Gilles Pisier on Sidon sets conclude 
this chapter in Sect. 19.4. 


Part I 
The Generic Chaining 


Chapter 2 ® 
Gaussian Processes and the Generic Cheek for 
Chaining 


2.1 Overview 


The overview of this chapter is given in Chap. 1, Sect. 1.7.1. More generally, 
Sect. 1.7.1 is the overview of Chapter n + 1. 


2.2 Measuring the Size of the Supremum 


In this section, we consider a metric space (T, d) and a process (X;)re7. Unless 
explicitly specified otherwise (and even when we forget to repeat it), we will always 
assume that the process is centered, i.e., 


WteT, EX,=0. (2.1) 


We will measure the “size of the process (X;);er” by the quantity E sup;c7 X;. 
Why this quantity is a good measure of the “size of the process” is explained in 
Lemma 2.2.1. 

When T is uncountable, it is not obvious what the quantity E sup,-7 X; means. 
We define it by the following formula 


1 


Esup X; = sup;Esup X;; FCT, F finite| ; (2.2) 
teT teF 


where the right-hand side makes sense as soon as each r.v. X; is integrable. This will 
be the case in almost all the situations considered in this book. 


' Such questions are treated in detail, for example, in [53] pages 42-43. 
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Let us say that a process (X;);er is symmetric if it has the same law as the process 
(—X;+)+e7. Almost all the processes we shall consider are symmetric (although this 
hypothesis is not necessary for some of our results). The following lemma justifies 
using the quantity E sup, X; to measure “the size of a symmetric process”: 


Lemma 2.2.1 If the process (X;)er is symmetric, then 


E sup |Xs — X;| = 2Esup X; . 


s,teT teT 


Proof We note that 


sup |X; — X;| = sup (Xs; — X;) = sup Xs + sup(—X;), 


s,teT s,teT seT teT 


and we take expectations.” oO 


Exercise 2.2.2 Consider a symmetric process (X+)+e7. Given any fo in T, prove 
that 


Esup|X,| < 2Esup X; + E|X,,| < 3Esup |X;| . (2.3) 
teT teT teT 


The previous exercise is easy, but this need not be always the case. The author 
has never taught this material in a classroom, and cannot really evaluate the level of 
difficulty of the exercises for a beginner. So please do not feel discouraged if most 
of the exercises feel like research problems.* A sketch of a solution is provided for 
almost every exercise. For the exercises which are too difficult, understanding this 
very concise sketch is in itself a good exercise. Just try to peek at the solution one 
line at a time. 

In this book, we often state inequalities about the supremum of a symmetric 
process using the quantity Esup,<7 X; simply because this quantity looks typo- 
graphically more elegant than the equivalent* quantity E sup, rer Xs — Xz|. It is 
good to remember that when X;, = 0 for some fo € T, (2.3) shows that there is not 
so much difference between E sup,<7 X; and E sup,¢7 | XI. 

We actually often need to control the tails of the r.v. sup, ;-7 |Xs — Xt], not only 
its first moment. Emphasis is given to the first moment because this is the difficult 


? To be really rigorous, we should first consider the case where T is finite and then appeal to (2.2), 
but it is better to skip this kind of tedious detail. 

3 Thad feedback from talented readers who felt that way. Consequently, I did not shy away to state 
as “exercises” rather non-trivial material complementing the text while being fully aware that one 
has to have achieved a rather complete understanding of the concepts as well as a mastery of the 
techniques to solve them. 

4 Equivalent does not mean equal; we have been dropping a factor 2 here. Generally speaking, 
the methods of this book are not appropriate to find sharp numerical constants, and all the crucial 
inequalities are “sharp within a multiplicative constant”. 
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part, and once this is achieved, control of higher moments is often provided by the 
same arguments. 


2.3. The Union Bound and Other Basic Facts 


From now on, we assume that the process (X;);e7 satisfies the increment condition: 


2 


Vu > 0, PX, — Xi] 2 u) <2exp(— 77a) (2.4) 


where d is a distance on T. In particular this is the case when (X;);e7 is a Gaussian 
process and d(s, t)? = E(x, — X,)?. Our goal is to find bounds on E sup,e7 Xz 
depending on the structure of the metric space (7, d). We will assume that T is 
finite, which, as shown by (2.2), does not decrease generality. 

Given any fo in T, the centering hypothesis (2.1) implies 


Esup X; = Esup(X; — Xi) . (2.5) 
teT 


teT 


The latter form has the advantage that we now seek estimates for the expectation of 
the nonnegativer.v. Y = sup,e7(X;— X1)). For such a variable, we have the formula 


EY = i P(Y >u)du. (2.6) 
0 


Let us note that since the function u +> P(Y => uw) is non-increasing, for any u > 0, 
we have the following: 


EY >uP(Y >u). (2.7) 


In particular P(Y > uw) < EY/u, a very important fact known as Markov’s 
inequality. Arguments such as the following one will be of constant use: 


Exercise 2.3.1 Consider ar.v. Y > 0 anda > O. Prove that P(Y < aEY) > 
1—1/a. 


Let us stress a consequence of Markov’s inequality: when Y is a kind a random 
error, of very small expectation, EY = b? where b is small. Then most of time Y is 
small: P(Y < b) > 1—b. 

According to (2.6), it is natural to look for bounds of 


P( sup(x, ae u) (2.8) 
teT 
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The first bound that comes to mind is the “union bound” 


P(sup(X1— Xi) = 4) S DOP — Xp 20). (2.9) 
teT 1eT 


It seems worthwhile to immediately draw some consequences from this bound and 
to discuss at leisure a number of other simple, yet fundamental facts. This will take 
a bit over three pages, after which we will come back to the main story of bounding 
Y. Throughout this work, A(T) denotes the diameter of T, 


A(T) = sup d(t, tr) . (2.10) 


t),t2ET 


When we need to make clear which distance we use in the definition of the diameter, 
we will write A(T, d) rather than A(T). Consequently (2.4) and (2.9) imply 
we 


— aap): (2.11) 


P( sup(X — Xp) > u) < 2card T exp ( 


teT 


Let us now record a simple yet important computation, which will allow us to use 
the information (2.11). 


Lemma 2.3.2 Consider a rv. Y > 0 which satisfies 


2 


Vu>0, PY Su) <A z a2 
u>0O, P(Y >u)< exp (— =) (2.12) 
for certain numbers A > 2 and B > 0. Then 

EY < LBV/logA. (2.13) 


Here, as in the entire book, L denotes a universal constant.° We make the 
convention that this constant is not necessarily the same on each occurrence (even 
in the same equation). This should be remembered at all times. One of the benefits 
of the convention (as opposed to writing explicit constants) is to make clear that 
one is not interested in getting sharp constants. Getting sharp constants might be 
useful for certain applications, but it is a different game.° The convention is very 
convenient, but one needs to get used to it. Now is the time for this, so we urge the 
reader to pay the greatest attention to the next exercise. 


> When meeting an unknown notation such as this previous L, the reader might try to look at the 
index, where some of the most common notation is recorded. 


© Our methods here are not appropriate for this. 
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Exercise 2.3.3 


(a) Prove that for x, y € R*+ we have xy — Lx? < Ly*/*. (Please understand this 
statement as follows: given a number 1, there exist a number L2 such that for 
all x, y € R we have xy — Lyx < Lzy?/*.) 

(b) Consider a function p(u) < 1 for u > 0. Assume that for u > L, we have 
plu) <L exp(—u?/L). Prove that for all u > 0 we have p(Lu) < 2exp(—u’). 
(Of course, this has to be understood as follows: assume that for a certain 
number L,, for u > Lj, we have p(u) < Li exp(—u?/L}). Prove that there 
exist a number L2 such that for all u > 0 we have p(L2u) < exp(—u7).) 

(c) Consider an integer N > 1. Prove that 


NE exp(—(log N)*/*/L) < Lexp(—(log N)*/*/L) . 


Proof of Lemma 2.3.2 We use (2.6), and we observe that since P(Y > u) < 1, for 
any number uo, we have 


oo uo oo 
EY = / P(Y > u)du / P(Y > u)du + P(Y > u)du 
0 0 uo 


oo u2 
uo | Aexp(- =r) du 


0 


IA 


1 ¢® uz 
<upt+— | uAexp(——)a 
swt of uA exp 7 Uu 
2 2 
= “0 2.14 
=uot Quo exp(— 58). (2.14) 


The choice of up = B./log A gives the bound 


BVlogA+ < LBv/logA 


B 
2/log A ~ 
since A > 2. oO 


Next, recalling that the process (X;);<7 is assumed to satisfy (2.4) throughout 
the section, we claim that 


Esup X; < LA(T),/logcardT . (2.15) 


teT 


Indeed, this is obvious if card T = 1. If card T > 2, it follows from (2.11) that (2.12) 
holds for Y = sup,;er (Xt — Xi) with A = 2cardT and B = A(T), and the result 
follows from (2.13) since log(2 card T) < 2logcard T and EY = Esup;er Xt. 
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The following special case is fundamental: 


Lemma 2.3.4 [f (gx)x>1 are standard Gaussian r.v.s, then 


E sup gx < L./logN . (2.16) 


k<N 


Exercise 2.3.5 
(a) Prove that (2.16) holds for any r.v.s gx which satisfy 


2 
P(gk = u) < 2exp(- +) (2.17) 


foru > 0. 

(b) For N > 2, construct N centered r.v.s (gx)x<n Satisfying (2.17) and taking only 
the values 0, -++./log N and for which E sup,-y gx => log N/L. (You are not 
yet asked to make these r.v.s independent.) 

(c) After learning (2.18), solve (b) with the further requirement that the r.v.s g, are 
independent. If this is too hard, look at Exercise 2.3.7 (b). 


This is taking us a bit ahead, but an equally fundamental fact is that when the r.v.s 
(gx) are jointly Gaussian and “significantly different from each other’, i.e., E(g, — 
ge)’ > a’? > O fork  £, the bound (2.16) can be reversed, i.e., E SUPp<y 8k = 


a,/log N/L, a fact known as Sudakov’s minoration. Sudakov’s minoration is a non- 
trivial fact, and to understand it, it should be really useful to solve Exercise 2.3.7. 
However, before that, let us point out a simple fact, which will be used many times. 


Exercise 2.3.6 Consider independent events (Ax)x>1. Prove that 


P( U Ax) >1- exp (- > P(A\)) : (2.18) 


k<N k<N 


In words, independent events such that the sum of their probabilities is small are 
basically disjoint. 


Exercise 2.3.7 


(a) Consider independent r.v.s Y; > 0 and u > O with 


y Peo e1. (2.19) 
k<N 


Prove that 


E sup Yx = 
k<N 


Hint: Use (2.18) to prove that P(supp<y Y, >u) > 1/L. 
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(b) We assume (2.19), but now Y; need not be > 0. Prove that 


E sup Yx > ois E|¥i|. 
k<N L 


Hint: Observe that for each event 92, we have Ele sup, Y, > —E|Yi|. 
(c) Prove that if (gx)x>1 are independent standard Gaussian r.v.s, then 


E suppey 8k = Slog N/L. 


Before we go back to our main story, we consider in detail the consequences 
of an “exponential decay of tails” such as in (2.12). This is the point of the next 
exercise. 


Exercise 2.3.8 


(a) Assume that for a certain B > 0, the rv. Y > 0 satisfies 


Vu>O0, P(Y > u) < 2exp(-=). (2.20) 
Prove that 
E ( : ) 25 (2.21) 
exp (— : : 
P\5R) = 
Prove that for x,a > O one has (x/a)“ < expx. Use this fora = p and 


x = Y/2B to deduce from (2.21) that for p > 1 one has 
(EY?)!/P < LpB. (2.22) 


(b) Assuming now that for a certain B > 0 one has 
me 
Vu>0, P(Y =u) < 2exp(—=5), (2.23) 


prove similarly (or deduce from (a)) that Eexp(Y / 2B’) < L and that for 
p = 1 one has 


(Ey?)/P < LBA/p. (2.24) 


(c) Consider ar.v. Y > 0 anda number B > 0. Assuming that for p > 1 we have 
(EY?)!/P < Bp, prove that for u > 0 we have P(Y > u) < 2exp(—u/(LB)). 
Assuming that for each p > 1 we have (EY?)!/? < B./p, prove that for u > 0 
we have P(Y > u) < 2exp(—u?/(LB*)). 


In words, (2.22) states that “as p increases, the L? norm of an exponentially 
integrable r.v. does not grow faster than p’”, and (2.24) asserts that if the square 
of the r.v. is exponentially integrable, then its L? norm does not grow faster than 
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./p. These two statements are closely related. More generally, it is very classical to 
relate the size of the tails of ar.v. with the rate of growth of its L? norm. This is not 
explicitly used in the sequel, but is good to know as background information. As the 
following shows, (2.24) provides the correct rate of growth in the case of Gaussian 
LV.S. 


Exercise 2.3.9 If g is a standard Gaussian r.v., it follows from (2.24) that for p > 1, 
one has (E|g|?)!/? < L./p. Prove one has also 


(Elg|?)!/? > ve ; (2.25) 


One knows how to compute exactly E|g|?, from which one can deduce (2.25). You 
are, however, asked to provide a proof in the spirit of this work by deducing (2.25) 
solely from the information that, say, for u > 0, we have (choosing on purpose 
crude constants) P(|g| > u) > exp(—u?/3)/100. 

You will find basically no exact computations in this book. The aim is different. 
We study quantities which are far too complicated to be computed exactly, and we 
try to bound them from above and sometimes from below by simpler quantities with 
as little a gap as possible between the upper and the lower bounds. Ideally the gap 
is only a (universal) multiplicative constant. 
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We go back to our main story. The bound (2.9) (and hence (2.15)) will be effective 
if the variables X; — X;, are rather uncorrelated (and if there are not too many of 
them). But it will be a disaster if many of the variables (X;);e7 are nearly identical. 
Thus, it seems a good idea to gather those variables X; which are nearly identical. 
To do this, we consider a subset 7; of T, and for ¢ in T, we consider a point z(t) 
in T;, which we think of as a (first) approximation of t. The elements of T which 
correspond to the same point 771 (t) are, at this level of approximation, considered as 
identical. We then write 


X; — Xt = X,— Xr) + Xm) — Xt . (2.26) 


The idea is that it will be effective to use (2.9) for the variables Xz, (+) — Xt), because 
there are not too many of them and, if we have done a good job at finding 71 (ft), they 
are rather different from each other (at least in some global sense). On the other 
hand, since z(t) is an approximation of f, the variables X; — Xz, (7) are “smaller” 
than the original variables X; — Xj), so that their supremum should be easier to 
handle. The procedure will then be iterated. 

Let us set up the general procedure. For n > 0, we consider a subset T,, of T, and 
for t € T, we consider z,,(t) in T,,. (The idea is that the points z(t) are successive 


2.4 The Generic Chaining 29 


approximations of ft.) We assume that 7p consists of a single element fo, so that 
mo(t) = to for each t in T. The fundamental relation is 


Xp — Xy = D9 (Xan) — Xana) » (2.27) 


n>1 


which holds provided we arrange that z7,(t) = t for n large enough, in which case 
the series is actually a finite sum. Relation (2.27) decomposes the increments of the 
process X; — Xj, along the “chain” (zp (¢))n>0 (and this is why this method is called 
“chaining”’). 

It will be convenient to control the set 7, through its cardinality with the 
condition 


card 7, < Nn (2.28) 
where 
No=1;N,=27 ifn>1. (2.29) 


Notation (2.29) will be used throughout the book. It is at this stage that the procedure 
to control T,, differs from the traditional one, and it is the crucial point of the generic 
chaining method. 

It is good to notice right away that /log N;, is about 2”/*, which will explain 
the ubiquity of this latter quantity. The occurrence of the function ./log x itself is 
related to the fact that it is the inverse of the function exp(x7) and that the function 
exp(—x7) governs the size of the tails of a Gaussian r.v. Let us also observe the 
fundamental inequality 

Nz < Not. 
which makes it very convenient to work with this sequence. 
Since z(t) approximates f, it is natural to assume that’ 


d(t, mn(t)) = d(t, Ty) := inf d(e,s). (2.30) 
SEln 


For u > 0, (2.4) implies 


P(| X(t) — Xsp_a(y| = u2"/?d(atn(t), Ha-1(t))) < Zexp(—w72"""). 


7 The notation := below stresses that this is a definition, so that you should not worry that your 
memory failed and that you did not see this before. 
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The number of possible pairs (7, (t), 2p—1(t)) is bounded by 
gnt+l 
card 7, - card T,-1 < NyNn-1 < Nn4i = 2 : 


We define the (favorable) event 2,» by 


Vt, (Xn) — Xml < 42" (tnt), Tr-10)) - (2.31) 
and we define 2,, = (es Qu n. Then 
pu) = P(R) < S* P(QE,) < Y2 2-2?" exp(—w2"4), (2.32) 
n>1 n>1 


Here again, at the crucial step, we have used the union bound P(Q¢) < 
st P(22i n)- When £2, occurs, (2.27) yields 


|X: — Xml <u >) 2"d(tn), m1), 


n>1 


so that 
sup |X; — Xp| < uS 
teT 
where 
S:= sup )>2"?d(ra(t), tn—10)- 
teT n>1 
Thus 


P( sup |X, =¥,| > uS) < plu). 
teT 


For n > 1 and u > 3, we have 


2 


yw22"-1 > 3 22-2 > ue grt ; 


from which it follows that 
u2 
plu) < Lexp(—=). 


We observe here that since p(u) < 1, the previous inequality holds not only for 
u > 3 but also for u > 0, because 1 < exp(9/2) exp(—u?/2) for u < 3. This type 
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of argument (i.e., changing the universal constant in front of the exponential, cf. 
Exercise (2.3.3)(b)) will be used repeatedly. Therefore, 
oe) 
P( sup |X; = a uS) < Lexp(- <) (2.33) 
teT 2 


In particular (2.33) implies 


Esup X; < LS. 
teT 


The triangle inequality yields 
(tn (t), Mn—1(t)) < d(t, M(t) + d(t, Mn-1 (1) = dt, Th) + dt, Th-1) , 


so that (making the change of variable n = n’ + 1 in the second sum below) 


S < sup })2"/?d(t, Tr) + sup )) 2"/7d(t, Tr1) < 3 sup ) )2”/7d(t, Tn) , 
teT teT teT 


n>1 = n>1 n>0 
and we have proved the fundamental bound 
Esup X, < Lsup )\2"d(t, ™). (2.34) 
teT teT n>0 


Now, how do we construct the sets T,,? It is obvious that we should try to make 
the right-hand side of (2.34) small, but this is obvious only because we have used 
an approach which naturally leads to this bound. In the next section, we investigate 
how this was traditionally done. Before this, we urge the reader to fully understand 
the next exercise. It will be crucial to understand a typical case where the traditional 
methods are not effective. 


Exercise 2.4.1 Consider a countable metric space, T = {f, fo, ...}. Assume that 
for each i > 2, we have d(t, tj) < 1/./logi. Prove that if T, = {t,t,...,tn,}, 
then for each t € T, we have Laer. 2"/2d(t, Tn) < L. 


We end this section by reviewing at a high level the scheme of the previous proof 
(which will be used again and again). The goal is to bound EY where Y is ar.v. > 0 
(here Y = sup, (X; — X1)).) The method consists of two steps: 


¢ Given a parameter u > 0, one identifies a “good set” §2,,, where some undesirable 
events do not happen. As u becomes large, P({2°) becomes small. 

¢ When £2, occurs, we bound Y, say Y < f(u) where f is an increasing function 
on R™. 
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One then obtains the bound 
CO lee) 

Ey =] P(Y > u)du < fo+ | P(Y > u)du 
0 f(0) 


= f0) +f fw)PY = fu))du, — (2.35) 


where we have used a change of variable in the last equality. Now, since Y < f(u) 
on §2,, we have P(Y > f(u)) < P(2°) and finally 


EY < f(0)+ if f'(w)P(QE)du . 
0 


In practice, we will always have P(92°) < Lexp(—u/L) and f(u) = A+ u°B, 
yielding the bound EY < A+ K(qa)B. 


2.5 Entropy Numbers 


For a number of years, chaining was systematically performed (as in Sect. 1.4) by 
choosing the sets T, so that sup,<7 d(t, T,) is as small as possible for card 7, < Ny. 
We define 


Bie ell ae inf sup d(t, Tn), (2.36) 
Tr CT,card Ty <Nn teT 


where the infimum is taken over all subsets 7;, of T with card7, < N,. (Since 
here T is finite, the infimum is actually a minimum.) We call the numbers e, (T) the 
entropy numbers. 

Let us recall that in a metric space, a (closed) ball is a set of the type B(t,r) = 
{s € T; d(s,t) <r}. Balls are basic sets in a metric space and will be of constant 
use. It should be obvious to reformulate (2.36) as follows: e, (7) is the infimum of 
the set of numbers r > 0 such that T can be covered by < N, balls of radius < r 
(the set 7;, in (2.36) being the set of centers of these balls). 

Definition (2.36) is not consistent with the conventions of operator theory, which 
uses eon to denote what we call e,.8 When T is infinite, the numbers en(T) are also 
defined by (2.36) but are not always finite (e.g., when T = R). 

Let us note that since No = 1, 


— Seo(T) < AW). (2.37) 


8 We can’t help it if operator theory gets it wrong. 
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Recalling that T is finite, let us then choose for each n a subset JT, of T with 
cardT, < Nn and en(T) = sup,er d(t, T,). Since d(t, Tn) < en(T) for each 


t, (2.34) implies the following: 


Proposition 2.5.1 (Dudley’s Entropy Bound [29]) Under the increment condi- 
tion (2.4), it holds that 


EsupX;<L))2"e,(T). (2.38) 


teT n>0 


We proved this bound only when T is finite, but using (2.2), it also extends to the 
case where T is infinite, as is shown by the following easy fact: 
Lemma 2.5.2 If U is a subset of T, we have en(U) < 2e,(T). 


The point here is that in the definition of e, (U), we insist that the balls are centered 
in U, notin T. 


Proof Indeed, if a > e,(T), by definition one can cover T by N,, balls (for the 
distance d) with radius a, and the intersections of these balls with U are of diameter 
< 2a, so U can be covered by N,, balls in U with radius 2a. oO 


Exercise 2.5.3. Prove that the factor 2 in the inequality e,(U) < 2e,(T) cannot be 
improved even if n = 0. 


Dudley’s entropy bound is usually formulated using the covering numbers of 
Definition 1.4.1. These relate to the entropy numbers by the formula 


é€n(T) = inf{e ; N(T,d,€) < Ny}. 


Indeed, it is obvious by definition of e,(T) that for « > e,(7), we have 
N(T,d, €) < Ny, and that if N(T, d, €) < Nn, we have e,(T) < €. Consequently, 


€ <e,(T) => N(T,d,€) > Ny 
=> N(T,d,6-)>1+™,. 
Therefore, 
en(T) 
Vlog(1 + Nn)(én(T) — en4i(T)) S Vlog N(T, d,€) de. 


enti (T) 


Since log(1 + N,) => 2” log2 for n > 0, summation over n > 0 yields 


eo(T) 
Jlog2 52"? (en(T) — enti (T)) </ : Jlog N(T, d, €) de . (2.39) 
0 


n>0 
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Now, 
D2 Cen(T) — ent) = 2"eq(T) — J 22° en (7) 
n>0 n>0 n>1 
A) 
>{1-— yet) ; 
( v2 n>=0 
so (2.39) yields 


0(T) 
» eT) < io Jlog N(T, d, €) de. (2.40) 


e 
n>0 0 


Hence Dudley’s bound now appears in the familiar form 


[oe 
Esup X; < al Vlog N(T, d,«€)de . (2.41) 
0 


teT 


Here, since log 1 = 0, the integral takes place in fact over 0 < € < eo(T). The 
right-hand side is often called Dudley’s entropy integral. 


Exercise 2.5.4 Prove that 


lo.) 
i Vlog N(T, d,e)de < LY °2"e,(T), 
0 


n>0 


showing that (2.38) is not an improvement over (2.41). 


Exercise 2.5.5 Assume that for each 0 < € < A and some a > O, we have 
log N(T, d, €) < (A/e)®. Prove that en(T) < K(a@)A2~-"/%, 


Here K (q) is a number depending only on a.’ This and similar notation are used 
throughout the book. It is understood that such numbers need not be the same on 
every occurrence, and it would help to remember this at all times. The difference 
between the notations K and L is that L is a universal constant, i.e., a number that 
do not depend on anything, while K might depend on some parameters, such as a 
here. 

When writing a bound such as (2.41), the immediate question is how sharp is 
it? The word “sharp” is commonly used, even though people do not agree on what 
it means exactly. Let us say that a bound of the type A < LB can be reversed 
if it is true that B < LA. We are not concerned with the value of the universal 


9 It just happens that in this particular case K (w) = 1 works, but we typically do not care about the 
precise dependence of K (a) ona. 
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constants.!° Inequalities which can be reversed are our best possible goal. Then, in 
any circumstance, A and B are of the same order. 

We give now a simple (and classical) example that illustrates well the difference 
between Dudley’s bound (2.38) and the bound (2.34) and which shows in particular 
that Dudley’s bound cannot be reversed. Consider an independent sequence (g;);>1 
of standard Gaussian r.v.s. Set X; = 0, and fori > 2, set 


Ca. (2.42) 


Jlogi 


Consider an integer s > 3 and the process (X;)1<j<N, So the index set is T = 
{1,2,..., Ns}. The distance d associated with the process, given by d(i, qv — 
E(X, — X;)?, satisfies fori, j > 2,i 4 j, 


eee ees <dG,j)< Soa : 
log(min(i, j))  log(min(i, j)) 
Consider 1 <n <s—l1land TZ, C T withcard T, = N,. There existsi < N,+1 
with i ¢ Ty. Then (2.43) implies that d(i, j) > 2~"/*/L for j € Ty. This proves that 
the balls of radius 28 centered on T,, do not cover T, so that e, (T) > 27 T. 
Therefore, 


(2.43) 


> 2 en (T) = — (2.44) 


n 


In the reverse direction, since fori > 1 we have d(1,i) < 1/,/logi, Exercise 
2.4.1 proves that the bound (2.34) is < L. Thus, the bound (2.38) is worse than the 
bound (2.34) by a factor about s. 


Exercise 2.5.6 Prove that when 7 is finite, the bound (2.41) cannot be worse 
than (2.34) by a factor greater than about log log card T. This shows that the previous 
example is in a sense extremal. Hint: Use 2”/7e,(T) < L SUPieT >on>0 2”"/2d(t, Tn) 
and e, (T) = Oif N, > cardT. 7 


How does one estimate covering numbers (or, equivalently, entropy numbers)? 
Let us first stress a trivial but nonetheless fundamental fact. 


Lemma 2.5.7 Consider anumbere€ > 0 anda subset W of T maximal with respect 
to the property 


stEewWw>d(s,t)>e. 


Then, N(T,d,€) < card W. 


!0 Not that these values are unimportant, but our methods are not appropriate for this. 
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Proof Since W is maximum, the balls of radius a centered at the points of W 
cover T. oO 


Exercise 2.5.8 Consider a probability measure w on T, a number € > O anda 
number a. Let U = {t € T; w(B(t, €)) = a}. Prove that N(U, d, 2€) < 1/a. 


The next exercise introduces the reader to “volume estimates”, a simple yet 
fundamental method for this purpose. It deserves to be fully understood. If this 
exercise is too hard, you can find all the details below in the proof of Lemma 2.13.7. 


Exercise 2.5.9 


(a) If (7, d) is a metric space, define the packing number N*(T, d, ¢) as the largest 
integer N such that T contains N points with mutual distances > €. Prove 
that N(T,d,¢) < N*(T,d,€). Prove that if «’ > 2e, then N*(T,d,€') < 
N(T,d,€). 

(b) Consider a distance d on R* which arises from a norm ll - Il. @@, y) = |lx - 
y|| and denote by B the unit ball of center 0. Let us denote by Vol(A) the k- 
dimensional volume of a subset A of R. By comparing volumes, prove that for 
any subset A of R*, 


Vol(A) 
N(A,d,€) = (2.45) 
Vol(e B) 
and 
Vol(A B 
WA goede tje et (2.46) 
Vol(€ B) 
(c) Conclude that 
1\k 2 k 
(=) < N(B,d,€) < (=) (2.47) 
€ € 


(d) Use (c) to find estimates of e,(B) of the correct order for each value of n. 
Hint: e,(B) is about 2-2"/k_ This decreases very fast as n increases. Estimate 
Dudley’s bound for B provided with the distance d. 

(e) Prove that if T is a subset of R* and if no is any integer, then for n > no, one 
has én41(T) < LS Pied), Hint: Cover T by Nn, balls of radius 2en,(T), 
and cover each of these by balls of smaller radius using (d). 

(f) This part provides a generalization of (2.45) and (2.46) to a more abstract setting 
but with the same proofs. Consider a metric space (T, d) and a positive measure 
jon T such that all balls of a given radius have the same measure, .(B(t, €)) = 
gy(€) for each e > O and each t € T. Fora subset A of T ande > Q, let 
A. = {t € T; d(t, A) < €}, where d(t, A) = inf,c4 d(t, s). Prove that 


WA 2 yids < lV? . 


p(2e) ~ ge) 
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There are many simple situations where Dudley’s bound is not of the correct 
order. We gave a first example on page 35. We give such another example in 
Exercise 2.5.11. There the set T is particularly appealing: it is a simplex in R”. 
Yet other examples based on fundamental geometry (ellipsoids in R*) are explained 
in Sect. 2.13. 

The result of the following exercise is very useful in all kinds of examples. 


Exercise 2.5.10 Consider two integers k,m with k < m/4. Assume for simplicity 
that k is even. 


(a) Prove that 


E)a@y"@) em 


0<0<k/2 


(b) Denote by Z the class of subsets of {1,...,m} of cardinality k. Prove that you 
can find in Z a family F such that for /, J € F onehas card(/\J)U(J\1) => k/2 
and card F > (m/(2k))*/? /2. Hint: Use (a) and part (f) of Exercise 2.5.9 for 
the counting measure on Z. Warning: This is not so easy. 


Exercise 2.5.11 Consider an integer m and an i.i.d. standard Gaussian sequence 
(gi)i<m. For t = (ti)i<m € R”, let X; = 90; <,, tigi. This is called the canonical 
Gaussian process on R’. Its associated distance is the Euclidean distance on R”. It 
will be much used later. Consider the set 


T=[(iicn ER"; 420, Du=i}, (2.49) 


i<m 


the convex hull of the canonical basis. By (2.16), we have Esup,e7 X; = 
Esup;<m i < L./logm. Prove, however, that the right-hand side of (2.41) is 
> (log m)?/?/L. (Hint: For an integer k < m, consider the subset T; of T consisting 
of sequences t = (t;)i<m € T for which t; € {0, 1/k}, so that t € T; is determined 
by the set 7 = {i < m; t; = 1/k} andcard/ = k. Using Exercise 2.5.10, prove 
that log N(Tx, d, 1/(LVk)) => klog(em/k)/L and conclude.!') Thus, in this case, 
Dudley’s bound is off by a multiplicative factor of about log m. Exercise 2.7.9 will 
show that in R” the situation cannot be worse than this. 


'l Tn case you wonder why e occurs in this formula, it is just to take care of the case where k is 
nearly m. This term is not needed here, but is important in upper bounds of the same nature that 
we will use below. 
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2.6 Rolling Up Our Sleeves: Chaining in the Simplex 


The bound (2.34) seems to be genuinely better than the bound (2.38) because when 
going from (2.34) to (2.38) we have used the somewhat brutal inequality: 


sup >» 2"/2d(t, Tn) < SS ts sup d(t Ty; 


teT n>=0 n>0 te 


The method leading to the bound (2.34) is probably the most important idea 
of this work. The fact that it appears now so naturally does not reflect the history 
of the subject, but rather that the proper approach is being used. When using this 
bound, we will choose the sets 7, in order to minimize the right-hand side of (2.34) 
instead of choosing them as in (2.36). As we will demonstrate later, this provides 
essentially the best possible bound for E sup,<7 X¢. It is remarkable that despite the 
fact that this result holds in complete generality, it is a non-trivial task to find sets T,, 
witnessing this, even in very simple situations. In the present situation, we perform 
this task by an explicit construction for the set T of (2.49). 


Proposition 2.6.1 There exist sets T, C R” with card T, < N, such that 


sup) 2"/7d(r, Tn) < L/logm (= PES): 


teT 150 


Of course here d is the Euclidean distance in R”. The reader may try to find these 
sets herself before reading the rest of this section, as there seems to be no better way 
to get convinced of the depth of the present theory. The sets 7,, are not subsets of T. 
Please figure out by yourself how to correct this. !? 


Lemma 2.6.2 For each t € T, we can find a sequence (p(n, t))n>0 of integers 
0 < p(n, t) < 2n with the following properties: 


es (2.50) 
n>0 
vn >0, pat1,t) < p(n,t) +2, (2.51) 
cardlizm: 722 ?™)) <2", (2.52) 


Proof There is no loss of generality to assume that the sequence (t;);<m is non- 
increasing. We set t; = 0 fori > m. Then for any n > 1 and 2”-! <j < 2", 
we have t; > ton, so that 2”! tn < pao pee t;. By summation overn > 1, 


we obtain )°., 2"fon < 2, and thus )>.)2”fon < 3. For n > 0, consider the 


n>1 n>0 


? The argument can be found in Sect. 2.14. 
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largest integer q(n, t) < 2n such that 2-9) > ton. Thus, 2799-1 < ton when 
q < 2n. In any case, 2-9) < 2ton + 2-7", and thus 7.9 27-9 < L. Also 
if t; > 2-9 > ton, then i < 2”. In particular card{i < m;t; > 277} < 2". 
Finally we define 


p(n,t) = min {g(k, 1) +2 —k);0<k <n}. 


Taking k = n shows that p(n,t) < q(n,t) < 2n, implying (2.52). If k < n is 
such that p(n, t) = q(k,t) + 2(n — k), then pin + 1,t) < g(k,Ho+2M4+1- 
k) = p(n, t) + 2, proving (2.51). Also, since 2"-?™) < 7, 27-20-H ak) = 
Vpen 207+ -4ED), we have 


~ gn pit) ES yo a ms < L., Oo 


n>0 k>0 n>k 


Givenaset/ C {1,..., m} anda integer p, we denote by V7, , the set of elements 
u = (Uj)i<m € R” such that uj = Oifi ¢ J and u; = r;2~? if i € J, where r; 
is an integer 0 < r; < 3. ThencardV; py» < 4card! For n > 1, we denote by Vi, 
the union of all the sets Vy,» for cardJ < 2” and 0 < p < 2n. Crudely we have 
card V, < m_2" We set Vo = {0} and forn > 1 we denote by U;, the set of all sums 
Mo<ken Xk Where xx € Ve. Then cardU, < mE2" 13 


Lemma 2.6.3 Consider t € T and the sequence (p(n,t))n>0 constructed in 
Lemma 2.6.2. Then for each n, we can write t = u(n) + v(n) where u(n) € Uy 
and where v(n) = (v(n);)i<m satisfies 0 < v(n); < min(t;, 272”), 


Proof The proof is by induction over n. For n = 0, we set u(0) = 0, v(0) = t. For 
the induction from n to n + 1, consider the set J = {i < m; v(n); > Tera. 
Since v(n); < tj, it follows from (2.52) that cardJ < 2"+1. For each i € T, let 
r; be the largest integer with 772-7? "+! < v(n); so that v(m); — rj2-PTF!Y < 
2-PM+L0 Since v(n); < 27?” by induction and since p(n + 1, t) < p(n, t) +2 
by (2.51), we have r; < 3. Define u = (uj)i<m € R” by uj = 7j2-7 "1! ifi eT 
and u; = 0 otherwise. Then, vu € V7 pinti,z) C Vn. Thus, t =u(n + 1) + 0(n + 1) 
where u(n + 1) := u(n) +u € Up_4; and v(n+ 1) := v(n) — u satisfies v(n+ 1); < 
min(t;, 2-2? @F1)), g 


Lemma 2.6.4 For eacht € T, we have are 2"/2d(t, Un) < L. 


Proof Consider the sequence (v(7))n>0 constructed in Lemma 2.6.3, so that 
d(t, Un) < ||v(n)|l2 since t = u(n) + v(n). Let hy = {i < mit, > 277%) 
so that by (2.52) we have card, < 2”. Forn > 1, set Jy = In \ In-1 80 
that fori € Jn, we have t; < 2-7-1), Then, llu(m)|IZ = Yy<, vn)? = 
‘wee u(n)? + Veen Doieh u(n)?. Since v(n); < 27? and card J, < 2”, the 


'5 Controlling the cardinality of U, is the key point. 
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first sum is < 2”~2?), Since v(n); < t; < 27-7“! fori € J and card J, < 2*, 
we have 7.7, v(m)? < 2% -2?@-1L). Thus, |lu(@a)Il3 < Dy, 2° 2? & 50 that 
lu) ll2 < oken 262-9 eA) and 


y 2"? Iu) Ilo < p> See = yee 2" 


n>1 n>1k>n k>1 n<k 


2i soe oe oe (2.53) 
k>1 


where we have used (2.50) in the last inequality. oO 


Proof of Proposition 2.6.1 Consider the smallest integer ko with m < Nx, so that 
2ko/2 < L./logm. Observe also that m2” < (22')2" =7e” = Nio+n- Thus, 
card U, < mE" < Nig+n+k, Where k; is a universal constant. Forn > ko +k, +1, 
we set Ty = Un—no—k,, 80 that card T, < Ny. Forn < ko + ki, we set T, = {O}. 
Finally, given t € T (and keeping in mind that k; is a universal constant), we have 


S27 d(t, Tn) < L2*0/? + yy Sede 
n>0 n>ko+k,+1 


and, using Lemma 2.6.4 in the last inequality, 
yo Pees DS 2a ee) 
n=kot+ki+1 n=ko+ki +1 


= yea Un) < L2*0/? | i 


n>1 


2.7 Admissible Sequences of Partitions 


The idea behind the bound (2.34) admits a technically more convenient formula- 
tion. !4 


Definition 2.7.1 Given a set T, an admissible sequence is an increasing sequence 
(An)n>o of partitions of T such that card A, < Ny, i.e., card Ag = | andcard A, < 
2?" forn > 1. 


'4 We will demonstrate why this is the case only later, in Theorem 4.5.13. 
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By an increasing sequence of partitions, we mean that every set of Ani is 
contained in a set of A,. Admissible sequences of partitions will be constructed 
recursively, by breaking each element C of A, into at most N, pieces, obtaining 
then a partition A,+, of T consisting of at most N2 < Nn+1 pieces. 

Throughout the book, we denote by A,(t) the unique element of A, which 
contains t. The double exponential in the definition of N, (see (2.29)) occurs 
simply since for our purposes the proper measure of the “size” of a partition A 
is logcard.A. This double exponential ensures that “the size of the partition A, 
doubles at every step”. This offers a number of technical advantages which will 
become clear gradually. 


Theorem 2.7.2 (The Generic Chaining Bound) Under the increment condi- 
tion (2.4) (and if EX; = 0 for each t), then for each admissible sequence (An) 
we have 


E sup X, < Lsup ) 2" A(An(0)) . (2.54) 
teT teT 


n>0 

Here as always, A(A,,(t)) denotes the diameter of A,(t) for d. One could think 
that (2.54) could be much worse than (2.34), but it will turn out that this is not the 
case when the sequence (A,,) is appropriately chosen. 


Proof We may assume T to be finite. We construct a subset 7,, of T by taking 
exactly one point in each set A of A,. Then fort € T andn > 0, we have d(t, T,) < 
A(A,(t)) and the result follows from (2.34). oO 


Definition 2.7.3 Given a > 0 and a metric space (7, d) (that need not be finite), 
we define 


ya(T, d) = infsup ) 2"! A(An(t)), 


teT 150 


where the infimum is taken over all admissible sequences. 


It is useful to observe that since Ao(t) = T, we have y,(T, d) > A(T). The most 
important cases by far are a = 2 and a = |. For the time being, we need only the 
case a = 2. The case a = | is first met in Theorem 4.5.13, although more general 
functionals occur first in Definition 4.5. 


Exercise 2.7.4 Prove that if d < Bd’, then y2(T, d) < By2(T,d’). 


Exercise 2.7.5 Prove that y.(T,d) < K(a)A(T) (log card T)!/“ when T is finite. 
Hint: Ensure that A(A,(t)) = Oif N, > card T. 


A large part of our arguments will take place in abstract metric spaces, and this 
may represent an obstacle to the reader who has never thought about this. Therefore, 
we cannot recommend too highly the following exercise: 
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Exercise 2.7.6 


(a) Consider a metric space (T, d), and assume that for each n > 0, you are givena 
covering 6, of T with card B, < Ny. Prove that you can construct an admissible 
sequence (A,) of partitions of T with the following property: 


Vn >1,VAEA,, ABE B-1, ACB. (2.55) 


(b) Prove that for any metric space (T, d), we have 
y(T,d) < LY) 2"e,(T) . (2.56) 
n>0 


The following exercise explains one of the reasons admissible sequences of sets are 
so convenient: given two such sequences, we can construct a third sequence which 
merges the good properties of the two sequences. 


Exercise 2.7.7 Consider a set T and two admissible sequences (4,) and (C,). 
Prove that there is an admissible sequence (A,,) such that 


Yn >1, VWAEA,, ABE B1, ACB, 3ICEC1,ACC. 


The following simple property should be clear in the reader’s mind: 
Exercise 2.7.8 


(a) Prove that for n > 0, we have 
2”? en(T) < Ly2(T,d) . (2.57) 


Hint: Observe that 2”/* max{A(A); A € An} < super yet 2"/2 A(A,(t)). 
(b) Prove that, equivalently, for ¢ > 0, we have 


eJ/log N(T,d,€) < Ly(T,d). 
The reader should compare (2.57) with (2.56). 


Exercise 2.7.9 Use (2.57) and Exercise 2.5.9 (e) to prove that if T C R”, then 


> 2"/2e,(T) < Llog(m + 1)y2(T, d) . (2.58) 


n>0 
In words, Dudley’s bound is never off by more than a factor of about log(m + 1) 


in R715 


'5 And we have shown in Exercise 2.5.6 that it is never off by a factor more than about 
log log card T either. 
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Exercise 2.7.10 Prove that the estimate (2.58) is essentially optimal. Warning: This 
requires some skill. 


Combining Theorem 2.7.2 with Definition 2.7.3 yields the following very 
important result: 


Theorem 2.7.11 Under (2.4) and (2.1), we have 


Esup X; < Ly2(T,d). (2.59) 
teT 


To make (2.59) of interest, we must be able to control y2(T, d), 1.e., we must learn 
how to construct admissible sequences, a topic we shall first address in Sect. 2.9. 


Exercise 2.7.12 When the process (X;);e7 satisfies (2.4) but is no longer assumed 
to be centered, prove that 


E sup [Xs — X;| < Lyp(T,d). (2.60) 


s,teT 


We now turn to the control of the tails of the process, which will follow by a 
small variation of the same argument. 


Theorem 2.7.13 Under (2.4) and (2.1), we have 


P( sup |X; — Xr| => Lyo(T,d) + LuA(T)) < Lexp(—u7) : (2.61) 


5,tET 
Proof We use the notation of the proof of (2.34). We may assume u > 1. Let us 
consider the smallest integer k > 0 such that u2 < 2* so that 2 < 2u?. Consider 
the event (2; defined by 
Wte Te, |X; — X| < 4u A(T), (2.62) 
so that by the union bound, 
P(Qi) < 2?" . 2 exp(—8u2) < 2exp(2u2 — 8u2) < exp(—u2) . (2.63) 
Consider the event {22 given by 
Vaek VWeeT , Xnw~—Xaol err AG); (2.64) 


so that by the union bound again, 


P(28) < 9°2?""' . 2exp(—2"+3) < ) 2exp(—2"*?) < 4exp(—u?), (2.65) 


n>k n>k 
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using, for example, in the last inequality that 2”+? > 2+? +n—k > 4u?+n—k. 
Consequently, P(, U 22) > 1—5 exp(—u7), and on this event, we have 


IX: — Xay] <2"? A(An(t)) < Lyp(T,d) , 


n>k 


so that |X; — Xip| < |Xe — Xay@)l + [Xap — Xml < LT, d)+ Lu A(T). 0 


Let us note in particular that using (2.24), 


I/p 
(E sup 1X;— Xi?) < LYpya(T.d) (2.66) 


s,teT 


Needless to say that we will look for extensions of Theorem 2.7.13. We can prove 
right away a particularly elegant result (due independently to R. Latata and S. 
Mendelson). Let us consider a process (X;)+e7, which is assumed to be centered 
but need not to be symmetric. For n > 1, consider the distance 6, on T given by 
bn(s, t) = ||Xs — X;z|lar. Denote by A, (A), the diameter of a subset A of T for the 
distance 45,. 


Theorem 2.7.14 Consider an admissible sequence (An)n>0 of partitions of T. 
Then, 


E sup |X; — X:| < Lsup )) An(An() - (2.67) 


s,teT teT n>0 


2 


Moreover, given u > 0 and the largest integer k with 2* < u*, we have 


P( sup |X; — X;| > LA,(T) + sup) An (An(0))) <Lexp(—u?). (2.68) 
teT 


s,teT n>0 


Proof The increment condition (2.4) will be replaced by the following: For ar.v. Y 
and p > 1, we have 


PUY| = 0) < Pq? =u?) < (EN)? (2.69) 
Uu 


Let us then consider the points z(t) as usual. For u > 1, let us consider the event 
2, defined by!® 


Wa >= 1, |Xa(t) — Xangi(yl S UAn(An@®) (2.70) 


'6 We are following here the general method outlined at the end of Sect. 2.4. 
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so that by the union bound and (2.69) for u > 4, we have 
2\2" 2\k L 
Pia) >| (=) oe (=) eon (2.71) 
n>1 k>2 


On §2,,, summation of the inequalities (2.70) forn > 1 yields sup,er |X:—Xx,()| < 
Lu Pat An(An(t)). Combining with (2.71), we obtain 


E sup |X; — Xx SLY) An(An(®)) . 
teT 


n>1 


Since E supyer |Xai(t) — Xao(t)| < LAo(T), we have Esup,er |Xt — Xaor)| < 
L paar An(An(t)), and (2.67) follows. The proof of (2.68) is nearly identical to 
the proof of (2.61) and is left to the reader. oO 


2.8 Functionals 


Given a metric space (7, d), how do we calculate y2(7, d)? Of course there is no 
free lunch. The quantity y2(T, d) reflects a highly non-trivial geometric character- 
istic of the metric space. This geometry must be understood in order to compute 
y2(T, d). There are unsolved problems in this book (such as Conjecture 17.1.4) 
which boil down to estimating y2(T, d) for a certain metric space. 

In this section, we introduce functionals, which are an efficient way to bring 
up the geometry of a metric space and to build competent admissible sequences, 
providing upper bounds for y2(7, d). We will say that a map F is a functional on 
a set T if to each subset H of T it associates a number F(H) > 0 and if it is 
increasing, i.e., 


HCH CTS F(A) <F(P’). (2.72) 


Intuitively a functional is a measure of “size” for the subsets of 7. It allows to 
identify which subsets of T are “large” for our purposes. A first example is given by 
F(#) = A(A). In the same direction, a fundamental example of a functional is 


F(H) = y2(H, d) . (2.73) 
A second example, equally important, is the quantity 


F(A) = Esup X; (2.74) 
teH 


where (X;);e7 iS a given process indexed by T and satisfying (2.4). 
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For our purposes, the relevant property of functionals is by no means intuitively 
obvious yet (but we shall soon see that the functional (2.73) does enjoy this 
property). Let us first try to explain it in words: if a set is the union of many small 
pieces far enough from each other, then this set is significantly larger (as measured 
by the functional) than the smallest of its pieces. “Significantly larger” depends on 
the scale of the pieces and on their number. This property will be called a “growth 
condition”. 

Let us address a secondary point before we give definitions. We denote by B(t, r) 
the ball centered at ¢ of radius r, and we note that 


A(B(t,r)) < 2r. 


This factor 2 is a nuisance. It is qualitatively the same to say that a set is contained in 
a ball of small radius or has small diameter, but quantitatively we have to account for 
this factor 2. In countless constructions, we will produce sets A which are “small” 
because they are contained in a ball of small radius r. Either we keep track of this 
property, which is cumbersome, or we control the size of A through its diameter and 
we deal with this inelegant factor 2. We have chosen here the second method. !” 

What do we mean by “small pieces far from each other”? There is a scale a > 0 
at which this happens and a parameter r > 8 which gives us some room. The pieces 
are small at that scale: they are contained in balls with radius 2a/r.'* The balls are 
far from each other: any two centers of such balls are at mutual distance > a. The 
reason why we require r > 8 is that we want the following: Two points taken in 
different balls with radius 2a/r whose centers are at distance > a cannot be too 
close to each other. This would not be true for, say, , = 4, so we give ourselves 
some room and take r > 8. Here is the formal definition. 


Definition 2.8.1 Givena > Oand an integerr > 8, we say that subsets H,,..., Hin 
of T are (a, r)-separated if 


Ve<m, He C Bite, 2a/r) , (2.75) 
where the points ), f2,..., 4m in T satisfy 
Ve, 0 <m, lA Sa <d(te, te) <2ar. (2.76) 


A secondary feature of this definition is that the small pieces H¢ are not only well 
separated (on a scale a), but they are in the “same region of T” (on the larger scale 
ra). This is the content of the last inequality in condition (2.76). 


Exercise 2.8.2 Find interesting examples of metric spaces for which there are no 
points f1,..., fm as in (2.76), for all large enough values of m. 


'7 The opposite choice was made in [132]. 
'8 This coefficient 2 is motivated by the considerations of the previous paragraph. 
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Now, what does “the union of the pieces is significantly larger than the smallest of 
these pieces” mean? This is an “additive property”, not a multiplicative one. In this 
first version of the growth condition, it means that the size of this union is larger 
than the size of the smallest piece by a quantity a,/log N where N is the number of 
pieces.!? Well, sometimes it will only be larger by a quantity of say a./Iog N/100. 
This is how the parameter c* below comes into the picture. One could also multiply 
the functionals by a suitable constant (i.e., 1/c*) to always reduce to the case c* = 1, 
but this is a matter of taste. 

Another feature is that we do not need to consider the case with N pieces for a 
general value of N, but only for the case where N = N,, for some n. This is because 
we care about the value of log N only within, say, a factor of 2, and this is precisely 
what motivated the definition of N,. In order to understand the definition below, one 
should also recall that Tog Ny is about 2”/?. 


Definition 2.8.3 We say that the functional F satisfies the growth condition with 
parameters r > 8 and c* > O if for any integer n > 1 and any a > 0 the following 
holds true, where m = N,,: For each collection of subsets Hj,..., Hj of T that are 
(a, r)-separated, we have 


F( U He) > cta2"/? + min F(Hp) . (2.77) 
l<m 


L<m 


This definition is motivated by the fundamental fact that when (X;)rer is a 
Gaussian process, the functional (2.74) satisfies a form of the growth condition (see 
Proposition 2.10.8). 

The following illustrates how we might use the first part of (2.76): 


Exercise 2.8.4 Let (T, d) be isometric to a subset of R* provided with the distance 
induced by a norm. Prove that in order to check that a functional satisfies the growth 
condition of Definition 2.8.3, it suffices to consider the values of n for which Ny41 < 
(1+ 2r)*. Hint: It follows from (2.47) that for larger values of n and m = Ny, there 
are no points ft], ..., 4m as in (2.76). 


You may find it hard to give simple examples of functionals which satisfy 
the growth condition (2.77). It will become gradually apparent that this condition 
imposes strong restrictions on the metric space (T, d) and in particular a control 
from above of the quantity y2(7,d). It bears repeating that y2(7, d) reflects the 
geometry of the space (T,d). Once this geometry is understood, it is usually 
possible to guess a good choice for the functional F’. Many examples will be given 
in subsequent chapters. 

As we show now, we really have no choice. Functionals with the growth property 
are intimately connected with the quantity y2(T, d). 


'9 We remind the reader that the function /log y arises from the fact that it is the inverse of the 
function exp (x). 
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Proposition 2.8.5 Assume r > 16. Then the functional F(H) = y2(H, d) satisfies 
the growth condition with parameters r and c* = 1/8. 


Proof Let m = N,, and consider points (t¢)e¢<m of T with d(te, te) > aif FC’. 
Consider sets He C B(te, a/8) and the set H = ieee H;. We have to prove that 


1 
y2(H, d) > —a2"/? + min y2(Hy, d) . (2.78) 
8 l<m 


Consider an admissible sequence of partitions (A,) of H, and consider the set 
In ={€<m; JA € An_-1; AC He}. 


Picking for @ € J, an arbitrary element A € A,_; with A C Hy defines a one-to- 
one map from J, to A,—1. Thus, card J, < card Ay_1 < Ny-1 < m = Ny. Hence, 


there exists £9 ¢ I,. Next, we prove that for t € Hz,, we have 


A(An-1(t)) 2 A(An-1(1) 9 Hey) + 7 (2.79) 


Since 9 ¢ In, we have An—1(t) Z Hey, so that since An—1(t) C H, the set Ay_1(t) 

must intersect a set Hy A Hy,, and consequently it intersects the ball B(tc, a/8). 

Since t € He,, we have d(t, B(te,a/8)) = a/2. Since t € An_j(t), this implies that 

A(An-1(t)) = a/2. This proves (2.79) since A(An_1(t) N He.) < AC(Me,) < a/4. 
Now, since for each k > 0 we have A(Ax(t)) > A(Ax(t) N Aen), we have 


Sy  2k/2(A(Ag(t)) — A(Ag(t) 9 Hey))) 
k=0 


> 20D? (A(Ay_1()) — A(An-10) N Ay) 


> 1 pa-p2 
— 4 J 
where we have used (2.79) in the last inequality, and, consequently, 


1 
S24? A(Ag(t)) > qe +02? ACA) A Ae) - (2.80) 
k>0 k>0 


Next, consider the admissible sequence (A/,) of He, given by A, = {AN 
Hy; A € An}. We have by definition 


sup > 2*/7 A(Ag(t) NM Hey) = ¥2(He, 4) - 
teHey 450 
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Hence, taking the supremum over ¢ in He, in (2.80), we get 


1 1 
sup ) > 2" A(Ag(t)) = <a2°")? + yo(Hey, d) = a2"? + min yo(Hp, d) . 
teHy, k= 4 8 l<m 


Since the admissible sequence (A,,) is arbitrary, we have proved (2.78). oO 


2.9 Partitioning Schemes 


In this section, we use functionals satisfying the growth condition to construct 
admissible sequences of partitions. The basic result is as follows: 


Theorem 2.9.1 Assume that there exists on T a functional F which satisfies the 
growth condition of Definition 2.8.3 with parameters r and c*. Then® 


y(T, d) < FO) + TPAIP (2.81) 


This theorem and its generalizations form the backbone of this book. The essence 
of this theorem is that it produces (by actually constructing them) a sequence of 
partitions that witnesses the inequality (2.81). For this reason, it could be called 
“the fundamental partitioning theorem”’. 


Exercise 2.9.2 Consider a metric space T consisting of exactly two points. Prove 
that the functional given by F(H) = 0 for each H C T satisfies the growth 
condition of Definition 2.8.3 for r = 8 and any c* > 0. Explain why we cannot 
replace (2.81) by the inequality y2(T,d) < LrF(T)/c*. 


Let us first stress the following trivial fact (connected to Exercise 2.5.9 (a)). It 
will be used many times. The last statement of (a) is particularly useful. 


Lemma 2.9.3 


(a) Consider an integer N. If we cannot cover T by at most N — | balls of radius 
a, then there exist points (te)e<n with d(te, ty) = a for € # €'. In particular if 
€n(T) > a, we can find points (te)c<n, with d(te, tv) = a forl ZL’. 

(b) Assume that any sequence (te)¢<m with d(te, te) > a for # £' satisfies m < 
N. Then, T can be covered by N balls of radius a. 

(c) Consider points (te)e<n,+1 such that d(te, te) => a for £ # &'. Then, en(T) = 
a/2. 


20 Tt is certain that as r grows, we must obtain a weaker result. The dependence of the right-hand 
side of (2.81) on r is not optimal. It may be improved with further work. 
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Proof 


(a) We pick the points tg recursively with d(te, te) > a for €’ < €. By hypothesis, 
the balls of radius a centered on the previously constructed points do not cover 
the space if there are < N of them so that the construction continues until we 
have constructed N points. 

(b) You can either view this as a reformulation of (a) or argue directly that when m 
is taken as large as possible the balls B(te, a) cover T. 

(c) If T is covered by sets (By’)¢<y,,, by the pigeon hole principle, at least two of 
the points te must fall into one of these sets, which therefore cannot be a ball of 
radius < a/2. Oo 


The admissible sequence of partitions witnessing (2.81) will be constructed by 
recursive application of the following basic principle: 


Lemma 2.9.4 Under the conditions of Theorem 2.9.1, consider B Cc T with 
A(B) < 2r-4 for a certain j € Z, and consider any n > 0. Let m = Ny. Then 
we can find a partition (Ag)¢<m of B into sets which have either of the following 
properties: 


A(Ag) <2r7J7!,, (2.82) 
or else 
te Ap > F(BN Bit, 2r-/~*)) < F(B) — c* 2"? J! (2.83) 


In words, the piece of the partitions have two further properties. Either (case (2.82)) 
we have reduced the bound on their diameter from 2r—/ for B to 2r—/—!, or 
else we have no new information on the diameter, but we have gathered the 
information (2.83). 


Proof Consider the set 
C={teB; F(BO BG, 2r7)) > F(B)— ct"? I} 


Consider points (tg)¢<, in C such that d(te, ter) = r—J—! for £ # &'. We prove 
that m’ < m. For otherwise, using (2.77) fora = r—J—! and for the sets He := 
BN B(te, 2r—/—2) shows that 


F(B) = F( U He) = tr P12"? + min FH) > FCB). 


L<m 


This contradiction proves that m’ < m. Consequently, using Lemma 2.9.3 (b) for 
N = m — 1, we may cover C by m’ < m balls (Be)¢<m of radius < r—J—!. We 
then set Ag = CM (Be \ Upree Be) for £ < m’, Ac = OB form’ < € < mand 
Am = B\C. Oo 
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So, in picturesque terms, Lemma 2.9.4 produces many small pieces and (possi- 
bly) a large one (on which one has further information). 

Before we start the proof of Theorem 2.9.1, we need the following technical fact 
which will be used many times: The sum of a geometric series is basically of the 
size of either its first or its last term. 


Lemma 2.9.5 Consider numbers (dy)n>0, Gn > 0, and assume sup, Gn < Od. 
Consider a > 1, and define 


T={n>0; Vk>0, kn, a <an,al""} (2.84) 


Then I # %, and we have 


yo % < = 7 aan: (2.85) 


Proof Let us write k < n when ax < a,a—"-*|. This relation is a partial order: if 
k <~nandn ~ p, then ag < Pe a ai a < apa lk-Pl, so that k ~ p. We can 
then restate the definition of J: 


f= {n= 05 Vee) 2k ona i}: 


In words, J is the set of elements n of N that are maximal for the partial order <. 
Next, we prove that for each k in N, there exists n € I with k ~ n. Indeed 
otherwise we can recursively construct an infinite sequence nj = n ~ nz <---, 
and this is absurd because ay,,, = adj, and we assume that the sequence (a;,) is 
bounded. 
Thus, for each k in N, there exists n € I with k < n. Then ay < ana7'"—*!|, and 
therefore 


Soak < yy ae = —— Se o 


k>0 nel k>0 nel 


Proof of Theorem 2.9.1 There is no question that this proof is the most demanding 
up to this point. The result is, however, absolutely central, on its own and also 
because several of our main results will follow the same overall scheme of proof. 

We have to construct an admissible sequence of partitions which witnesses the 
inequality (2.81). The construction of this sequence is as simple as it could be: we 
recursively use Lemma 2.9.4. More precisely, we construct an admissible sequence 
of partitions A,, and for A € A,, we construct an integer j,(A) € Z with 


A(A) < 2r7 | (2.86) 
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We start with Ag = {7} and jo(T) the largest integer jo € Z with A(T) < 2r—40, 
so that 2r-/0 < rA(T). Having constructed A,, we construct A,+1 as follows: for 
each B € A,, we use Lemma 2.9.4 with j = j,(B) to split B into sets (Ag)c<y,. 
If Ag satisfies (2.82), we set jn41(Ag) = jn(B) + 1, and otherwise (since we have 
no new information on the diameter) we set jn41(Ag) = jn(B). Thus, in words, 
Jn+41 (Ae) = jn(B) + Lif Ag is a small piece of B and jn41(Ae) = jn(B) if Ag is 
the large piece of B. 

The sequence thus constructed is admissible, since each set B in A, is split into 
at most N,, sets and since N? < Nn+1. We note also by construction that if B € A, 
and A Cc B, A € An41, then 


* Either jp41(A) = jn(B) +1 
¢ Orelse jn41(A) = jn(B) and 
te A> F(BN Bit, 2r-#H'A-2)) < F(B) — c#2"/24p— et AT (2.87) 


Now we start the hard part of the proof, proving that the sequence of partitions we 
just constructed witnesses (2.81). For this, we fix t € T. We want to prove that 


Sy 2" A(An(t)) < EY F(T) + LrA(T) 
n a= c* « 


n>0 


We set j(n) = jn(An(t)), so that jv) < jt 1) < jin) + 1. We set a(n) = 
2"/27-J) | Since 2"/2 A(A,(t)) < 2a(n), it suffices to show that 


yi amn) < F(T) +LrA(T). (2.88) 


n>0 


First, we prove a side result, that form > 0 we have 
Lr 
a(n) < et) +LA(T). (2.89) 
c 


Ifn > land j(—1) = j(n), then using (2.87) for n — 1 rather than n yields (2.89). 
Next, ifn > 1 and j(n — 1) = j(m) — 1, thena(n) = V2r-la(n —1) <a(n-1) 
since r > 8, and iterating this relation until we reach an integer n’ with either 
j@ —1)= jv’) orn’ = 0 proves (2.89) since a(0) < LA(T). 

In particular the sequence (a(n)) is bounded. Consider then the set J as provided 
by Lemma 2.9.5 fora = ./2 and ay, = a(n), that is, 


T={n>0; Vk>0,n¢k, alk) <a(nj2*"7] | 
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Recalling that a(0) = ri l< rA(T)/2, it suffices to prove that 


Lr 
x a(n) < —F(T). (2.90) 
nel\{0} a 


Forn € I,n > I, we have a(n + 1) < /2a(n) and a(n — 1) < J 2a(n). Since 
a(nt+ 1) = V2r/M-J@+Da(n), this implies 


iat) =jm)tl; ja-)=jm). (2.91) 

Proving (2.90) is the difficult part. Assuming first that J is infinite (the important 

case), let us enumerate the elements of J \ {O} as ny < nz < ... so that (2.91) 
implies 

im +1 = je) +1; jae -D) = fle). (2.92) 


In words, n x is at the end of a sequence of partition steps in which Ag+1(t) was the 
large piece of Ag(t) and Ay, +) (t) is a small piece of Ay, (¢). Let us note that as a 
consequence of (2.92), we have 


Ings Z Iti 2 In +1. 

The key to the proof is to show that for k > 1, we have 
a(n) < LCF m ~ 1) ~ Fmesa)) (2.93) 
where f(n) = F(A,(t)). Now the sequence (f (n)) is decreasing because A, (t) C 


An—1(t) and f(0) = F(T). When k > 2, then f(nz — 1) < f(nmg_1), So that (2.93) 
implies 


L 
a(nx) < (Fu) — f(re2)) . (2.94) 


Summation of the inequalities (2.94) for k > 2 then yields 


Y ain) < F(T), (2.95) 
Cc 
k>2 


and combining with (2.93) for k = 1 proves (2.90) and concludes the proof of the 
theorem when / is infinite. 
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We now prove (2.93). Since nz > 1, we may define n* := nz — 1. By (2.92), 
we have j(nz — 1) = j(nx), ie., j(n*) = j(n* + 1). We may then use (2.87) for 
B= Aj,«(t), A= An, (t) = An*+1(t) to obtain that 

F(BOB(t, Qr— Int 41(A)—2y) < F(B)- CFM 12 Int (AVL 
Recalling that n* = n; — 1, this means 
F(BO BC, 2r—#%eA-2)) < F(B) — ct2 VDP pig AAT (2.96) 
so that 
Lr —j,, (A)—2 
a(ny) < —(F(B) — F(BN Bit, 27K) (2.97) 
c 
Furthermore, by (2.92), 
J(Mk42) = JM) +1 > jr) +2. (2.98) 


Since j (142) = jngy2(Angyo()), (2.86) implies A(An,,,()) < Qr—I MkH2) << 
2r—J@n)—2 go that 


Ans) GB Bearer) (2.99) 


and thus f(m42) = F(Anio(t)) < F(BO Bt, 2r—%“~*)), Combining 
with (2.97) and since F(B) = f (ng — 1), we have proved (2.93). 

Assuming now that J is finite, it has a largest element nz. We use the previous 
argument to control a(n) when k +2 < k, and fork = k — 1 andk = k, we simply 
use (2.89). oO 


It is important for the sequel that you fully master the previous argument. 


Exercise 2.9.6 We say that a sequence (Fy)n>o of functionals on (T, d) satisfies 
the growth condition with parameters r > 8 and c* > 0 if 


vn =0, Fr41 < Fa 


and if for any integer n > O and any a > 0 the following holds true, where m = N,,: 


For each collection of subsets Hj,..., Hj, of T that are (a, r)-separated, we have 
Fr( U He) > a2"? + min Fra (He) - (2.100) 
l<m 


L<m 
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Prove that then 
Lr 
yo(T, d) < — ho) + LraA(T). (2.101) 
Cc 


Hint: Copy the previous arguments by replacing everywhere F'(A) by F,(A) when 
A € An. 


Proposition 2.9.7 Consider a metric space (T,d), and for n > 0, consider subsets 
T, of T with card Ty = 1 and cardT, < Ny forn > 1. Consider a number S, and 
let 


U= {t eT; >.2"?d(t, Tm) < s| 


n>0 


Then y2(U, d) < LS. 


Proof For H Cc U, we define F(H) = infsup,ey yt 2"/2d(t, Vn) where 
the infimum is taken over all choices of V, C T with cardV, < WN,. It is 


important here not to assume that V, C H to ensure that F is increasing. We then 
prove that F satisfies the growth condition by an argument very similar to that of 
Proposition 2.8.5. The proof follows from Theorem 2.9.1 since A(U, d) < 25S, as 
each point of U is within distance S of the unique point of To. oO 


A slightly different partitioning scheme has recently been discovered by R. van 
Handel [141], and we describe a variant of it now. We consider a metric space (T, d) 
and an integer r > 8. We assume that for j € Z, we are given a function s;(t) > 0 
on T. 


Theorem 2.9.8 Assume that the following holds: 


For each subset A of T, for each j € Zwith A(A) < 2ar—4 and for each 
n > 1, then either e,(A) < r~J—! or else there exists t € A 


with s(t) > 2"/?rJ7! (2.102) 


Then, 


y2(T, d) < L(A) 7 sup 991) (2.103) 
TE! jeZ 


We will show later how to construct functions s;(t) satisfying (2.102) using a 
functional which satisfies the growth condition.”! 


21 See [141] for other constructions. 
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The right-hand side of (2.103) is the supremum over ¢t of a sum of terms. It need 
not always be the same terms which will contribute the most for different values of 
t, and the bound is definitely better than if the supremum and the summation were 
exchanged. 


Proof of Theorem 2.9.8 Consider the largest jo € Z with A(T) < 2r~/, so that 
2r-J0 < rA(T). We construct by induction an increasing sequence of partitions 
A, with card A, < N,, and for A € A,, we construct an integer j,(A) € Z with 
A(A) < 2r7J\4), We start with Ap = A; = {T} and jo(T) = f(T) = jo. 

Once A, has been constructed (n > 1), we further split every element B € Ay. 
The idea is to first split B into sets which are basically level sets for the function 
s;(t) in order to achieve the crucial relation (2.107) and then to further split each 
of these sets according to its metric entropy. More precisely, we may assume that 
S = supyer jez s;(t) < oo, for there is nothing to prove otherwise. Let us set 
J = jn(B), and define the sets Ax for 1 < k <n by setting fork <n 


A= {te B; 2-*8 <s;@) <2-"'5}, (2.104) 
and 
An ={teB; sjt)< Rt (2.105) 
The purpose of this construction is to ensure the following: 
k<n; t,t €Ay>sj@) < 26,0428). (2.106) 


This is obvious since s;(t’) < 2s;(t) fork <n and sj(t') < 2-"+18 if k =n. For 
each set Ay, k <n, we use the following procedure: 


° Ife,_1(Ag) < r—J—!, we may cover Ax by at most N,~1 balls of radius QrJ—!, 
so we may split A; into N,—1 pieces of diameter < 4r—J—!. We decide that each 
of these pieces A is an element of A,+1, for which we set jn41(A) = j + 1. 
Thus, A(A) < 4r7/7! = 4r7 int A), 

* Otherwise we decide that Ay € Ani, and we set jn41(Ag) = j. Thus, 
A(Ax) < 2r7/ = 2r—/n+1(4), From (2.102), there exists t’ € Ag for which 
sit) = 2@-D/2--J-1 Then by (2.106), we have 


Vt € Ay; 20-Vp-I-! < 2(8;(1) +2") (2.107) 


In summary, if B € A, and A € Anyi, A C B, then 


¢ Either jn41(A) = jn(B) +1 
* Orelse Jn+1 (A) = Jn(B) and, from (2.107), 


Wee A; 2°-V2p hb! < 205; (ay) +2") . (2.108) 
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This completes the construction. Now for n > 1, we haven < N,- so that 


card Ant, < nNn—-1Nn < Nn+1 and the sequence (A,) is admissible. Next, 
we fix t € T. We set jn = jn(An(t)), and we observe that by construction 
jn < jnt < jn +1. Since A(An(t)) < 4r~H©, we have 2"/2A(An(t)) < 4a(n) 
where a(n) := 2"/27—Jn, To complete the argument, we prove that 


a(n) < Lr(S+ A(T)) . (2.109) 


n>0 


For this, consider the set / provided by Lemma 2.9.5 fora = ./2, so that since 
r_/0 < 2rA(T) it suffices to prove that 


x a(n) < LrS. (2.110) 


nel\{0} 


For n € I \ {0}, it holds that jz-1 = jn < jn+1 (since otherwise this contradicts 
the definition of 7). In particular, the integers j, forn € I are all different so that 
20 Sin(t) < S. Using (2.108) for n — 1 instead of n yields 2°-?/?7-sn-! < 
2(8j,,(t) + 2-"*1S). Since jn—1 = jn, we get 


a(n) < Lr(s;,(t) +2°-"S) , 


and summing these relations, we obtain the desired result. Oo 
The following connects Theorems 2.9.1 and 2.9.8: 


Proposition 2.9.9 Assume that the functional F satisfies the growth condition with 
parameters r and c*. Then the functions 


sjt)= = (FBC. ar-J+!)) — F(Br, ar-i-2)y) 


satisfy (2.102). 
Proof Consider a subset A of T, j € Zwith A(A) < 2r~/ andn > 1.Letm = Np. 
If e,(A) > r—J—1, then by Lemma 2.9.3, we may find (t¢)¢<m in A with d(te, te) > 


r—J—! for € # ’. Consider the set He = B(te, 2r~/~*) so that by (2.77) used for 
a =r~/—!, it holds that 


F( U Ht) > Xp -J—12"/2 4 min F(He) - (2.111) 
l<m 


l<m 


Let us now consider £9 < m such that F(He,) achieves the minimum in the 
right-hand side, so that mine<m F (He) = F(B(tey, ae), The crude inequality 
Qr—J-2 4 Or-F <Q Jt implies that He C B(te,, 2r—J*!) for each @, so that 
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F (Oem He) < F (B(tey, 2r-/*!)). Then (2.111) implies 


F(B(tey, 2r-/*")) > ctr 712"? 4. F(B(tey, 27-47) 


i.e., $j(te9) = 27/?r-J-1, g 


Despite the fact that the proof of Theorem 2.9.8 is a few lines shorter than the 
proof of Theorem 2.9.1, in the various generalizations of this principle, we will 
mostly follow the scheme of proof of Theorem 2.9.1. The reason for this choice 
is simple: it should help the reader that our various partition theorems follow a 
common pattern. The most difficult partition theorem we present is Theorem 6.2.8 
(the Latata-Bednorz theorem), which is one of the highlights of this work, and it is 
not clear at this point whether the method of Theorem 2.9.8 can be adapted to the 
proof of this theorem. 

The following simple observation allows us to construct a sequence which is 
admissible from one which is slightly too large. It will be used several times. 


Lemma 2.9.10 Consider a > 0, an integer t > 0, and an increasing sequence of 
partitions (By)n>0 with card By, < Nn+r. Let 


S:= sup ) 2"! A(By(t)) . 


teT 130 


Then we can find an admissible sequence of partitions (An)n>o such that 


sup 9 2"/“A(An(t)) <27/°(§ 4+ K(a)A(T)) . (2,112) 


teT n>0 


Of course (for the last time) here K (~) denotes a number depending on a@ only 
(that need not be the same at each occurrence). 


Proof We set A, = {T} ifn < t and A, = By_; ifn > t so that card A, < Np 
and 


2" A(An(t)) = 2° J 2"/ AB, (1) - 


n2=Tt n>0 


Using the bound A(A;,,(t)) < A(T), we obtain 


D2" A(An(t)) < K(@)2"* A(T) . o 


n<Tt 


Exercise 2.9.11 Prove that (2.112) might fail if one replaces the right-hand side by 
K(a, t)S. Hint: S does not control A(T). 
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2.10 Gaussian Processes: The Majorizing Measure Theorem 


Consider a Gaussian process (X;)+e7, that is, a jointly Gaussian family of centered 
r.v.s indexed by 7. We provide T with the canonical distance 


d(s,t) = (E(Xs — X,)?)'". (2.113) 


Recall the functional yz of Definition 2.7.3. 


Theorem 2.10.1 (The Majorizing Measure Theorem) For a universal constant 
L, it holds that 


1 
—y2(T, d) < Esup X; < Ly2(T, d) . (2.114) 
L teT 


The reason for the name is explained in Sect.3.1. We will meditate on this 
statement in Sect. 2.12. We will spend much time trying to generalize this theorem to 
other classes of processes. To link the statements of these generalizations with that 
of (2.114), it may be good to reformulate the lower bound y2(T, d) < LEsup,e7 Xi 
in the following general terms: 


The control from above of Esup X; implies the existence of a 
teT 


“small” sequence of admissible partitions of T . 


The right-hand side inequality in (2.114) is Theorem 2.7.11. To prove the lower 
bound, we will use Theorem 2.9.1 and the functional 


F(A) = Esup X; := sup E sup X;. (2.115) 
teH H*CH,H*finite teH* 


For this, we need to prove that this functional satisfies the growth condition with 
c* a universal constant and to bound A(T). We strive to give a proof that relies on 
general principles and lends itself to generalizations. 


Lemma 2.10.2 (Sudakov Minoration) Assume that 


Vp.qsxm, p#q >d(ty,ty)=a. 


Then we have 


E sup X,, > —vlogm . (2.116) 
p<m Ly 
Here and below Lj, L2, ... are specific universal constants. Their values remain the 


same, at least within the same section. 
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The proof of the Sudakov minoration is given just after Lemma 15.2.7. 


Exercise 2.10.3. Prove that Lemma 2.10.2 is equivalent to the following statement: 
If (X+)ter 18 a Gaussian process and d is the canonical distance, then 


en(T, d) < L27"/*Esup X; . (2.117) 
teT 


Compare with Exercise 2.7.8. 


To understand the relevance of Sudakov minoration, let us consider the case 
where EX; < 100a? (say) for each p. Then (2.116) means that the bound (2.15) is 
of the correct order in this situation. 


Exercise 2.10.4 Prove (2.116) when the r.v.s X;, are independent. That is, assume 
that these variables are Gaussian independent centered. Hint: Use the method of 
Exercise 2.3.7 (b). 


Exercise 2.10.5 A natural approach (“the second moment method’) to prove that 
P(sup,<m Xt, = u) is at least 1/L for a certain value of u is as follows: consider the 
rv. Y= a 1(x,,>u)> prove that EY* < L(EY)?, and then use the Paley-Zygmund 
inequality (6.15) to prove that sup,,<, X1, > a./logm/L, with probability > 1/L. 
Prove that this approach works when the r.v.s X;, are independent, but find examples 
showing that this naive approach does not work in general to prove (2.116). 


The following is a very important property of Gaussian processes and one of the 
keys to Theorem 2.10.1. It is a facet of the theory of concentration of measure, a 
leading idea of modern probability theory. We refer the reader to [52] to learn about 
this. 


Lemma 2.10.6 Consider a Gaussian process (X;)reu, where U is finite, and let 
o = sup,cy (EX7)!/”. Then for u > 0, we have 


P( 


In words, the size of the fluctuations of sup,-, X; are governed by the size of the 
individual r.v.s X;, rather than by the (typically much larger) quantity E sup,<y X¢. 
It is essential that the cardinality of U does not appear in (2.118). 


2 
sup X; — Esup X;| > u) < 2exp(— =) : (2.118) 


teU teU 


Exercise 2.10.7 Find an example of a Gaussian process for which 


Esup X; >o = sup(EX?)!/? : 
teT teT 
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whereas the fluctuations of sup,e7 Xr a of order o, e.g., the variance of sup, X; 
is about 07. Hint: T = {(t))i<n; y < l} and X; = )0;-,, tigi where g; are 
independent standard Gaussian r.v.s 


i<n LF 


Proposition 2.10.8 Consider points (te) ¢<m in T. Assume that d(te, tv) > aifl F 
t'. Consider o > O and for < mafinite set Hy C B(te, 0). Thenif H = C= Ay, 
we have 


E sup X; = Tvicgm — Lo0./logm + min E sup X;. (2.119) 


teH Sm teHe 


Wheno < a/(2L,L2), (2.119) implies 


E sup X; = 5, Vem + minE sup X; , (2.120) 


teH (<m tee 


which can be seen as a generalization of Sudakov’s minoration (2.116) by taking 
He = {te}. When m = Np, (2.120) proves that the functional F(H) = Esup;er X1 
satisfies the growth condition (2.77). 


Proof We can and do assume m > 2. For £ < m, we consider the rv. 


Y, = ( sup X;) — X;, = sup(X; — Xx). 
teHe teHe 


Fort € He, we set Z; = X;—Xt,. Since He C B(te, 7), we have EZ? =d(t, te)? < 
o”, and for u > 0, Eq. (2.118) used for the process (Z;)+eH, implies 


uz 
Py, — EY;| > u) <2exp(- 55). (2.121) 


Thus, if V = maxe<m |Ye — EY¢|, then combining (2.121) and the union bound, we 
get 


Uu exp 2.122 
— — 2 ) ? 
and (2.13) implies 
EV < L20./logm é (2.123) 


Now, for each £ < m, 


Ye > E¥e —-V > minEY? — 
L<m 
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and thus 


sup X; = ¥e+ Xp, > Xy,+ min EY; — 
teHe 


so that 


sup X; > max X;, + min EY, — 


teH l<m 


Taking expectations, we obtain 


E sup X; = Emax X,, aun EY; -—EV, 
teH 


and we use (2.116) and (2.123). oO 


Exercise 2.10.9 Prove that (2.120) might fail if one allows o = a. Hint: The 
intersection of the balls B(te, a) might contain a ball with positive radius. 


Exercise 2.10.10 Consider subsets (Hz)e<m of B(O,a) and H = Ug< He. Prove 
that 


E sup X; < La,/logm + max E sup X; . (2.124) 


teH (sm teHy 


Try to find improvements on this bound. Hint: Peek at (19.61). 


Proof of Theorem 2.10.1 We fix r = max(8,4L1L2), so that 2a/r < a/2L,L2. 
The growth condition for the functional F of (2.115) follows from (2.120), which 
implies that (2.77) holds for c* = 1/L. Theorem 2.9.1 implies 


y2(T,d) < LEsup X;+ LA(T). 
teT 
To control the term A(T), we write that for t,, fo € H, 


1 
E max(X;,, X;,) = Emax(X;, — X;,,0) = ——=d(t, tr), 


V20 


so that A(T) < /2mEsup;er Xt. oO 


The proof of Theorem 2.10.1 displays an interesting feature. This theorem 
aims at understanding Esup,<7 X;, and for this, we use functionals that are 
based on precisely this quantity. This is not a circular argument. The content of 
Theorem 2.10.1 is that there is simply no other way to bound a Gaussian process 
than to control the quantity y2(T, d). The miracle of this theorem is that it relates 
in complete generality two quantities, namely, E sup,<7 X; and y2(T, d) which are 
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both very hard to estimate. Still, in concrete situations, to estimate these quantities, 
we must in some way gain understanding of the underlying geometry. 
The following is a noteworthy consequence of Theorem 2.10.1: 


Theorem 2.10.11 Consider two processes (Y;)jer and (X;)ter indexed by the 
same set. Assume that the process (X;)ter is Gaussian and that the process (Y;)teT 
satisfies the increment condition 


2 


u 
ie >) (2.125) 


vu >0, Ws,teT, P(¥s — Yi = w) = 2exp(- 


where d is the distance (2.113) associated with the process X;. Then we have 


E sup |¥, — ¥;| < LEsup X; . (2.126) 


s,teT teT 
Processes satisfying the condition (2.125) are sometimes called sub-Gaussian. We 
will see many examples later (see (6.2)). 
Proof We combine (2.60) with the left-hand side of (2.114). oO 
Let us also note the following consequence of (2.126) and Lemma 2.2. 1:77 


Corollary 2.10.12 Consider two Gaussian processes (X;)teT and (Y;)ter. Assume 
that 


VeoteT, EY,—7,) = EX, = xX ; 
Then, 


Esup Y; < LEsup X; . (2.127) 
teT teT 


2.11 Gaussian Processes as Subsets of a Hilbert Space 


In this section, we learn to think of a Gaussian process as a subset of a Hilbert space. 
This will reveal our lack of understanding of basic geometric questions. 

First, consider a Gaussian process (Y;);<7, and assume (the only case which is of 
interest to us) that there is a countable set T’ C T which is dense in T. We view each 
Y, as a point in the Hilbert space L?(S2, P) where (Q, P) is the basic probability 
space. The closed linear span of the r.v.s (Y;)rer in rie. P) is a separable Hilbert 
space, and the map ¢ + Y;, is an isometry from (7, d) to its image (by the very 


22 It is known that (2.127) holds with L = 1, a result known as Slepian’s lemma. Please see the 
comments at the end of Sect. 2.16. 
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definition of the distance d). In this manner, we associate a subset of a Hilbert space 
to each Gaussian process. 

Conversely, consider a separable Hilbert space, which we may assume to be £7 = 
¢?(N).?3 Consider an independent sequence (g;);>1 of standard Gaussian r.v.s. We 
can then define the Gaussian process (X;),<92, where 


i= Pa ti Bi (2.128) 


i>1 
(the series converges in L?(Q)). Thus, 


Ex =) =A. (2.129) 


i=1 


In this manner, for each subset T of £2, we can consider the Gaussian process 
(X;)rer. The distance induced on T by the process coincides with the distance of 
é? by (2.129). 

A subset T of ¢ will always be provided with the distance induced by £7, so we 
may also write y2(T) rather than y2(T, d). We denote by conv T the convex hull 
of T. 


Theorem 2.11.1 For a subset T of 7, we have 
y2(convT) < Ly2(T). (2.130) 


Of course we also have y2(7) < y2(conv T) since T C convT. 


Proof To prove (2.130), we observe that since Xqj1,+a9t, = 41X1, +42X1,, we have 


sup X; = sup X; . (2.131) 
teconv T teT 
We then use (2.114) to write 
1 
—y2(convT)<E sup X;=Esup X; < Ly2(T). fe 
L teconv T teT 


A basic problem is that it is absolutely not obvious how to construct an admissible 
sequence of partitions on conv T witnessing (2.130). 


Research Problem 2.11.2 Give a geometrical proof of (2.130). 


23 Throughout the book, N is the set of natural numbers starting at 0, N = {0,1,...}, whereas 
N* = N \ {0}. 
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What we mean by geometrical proof is a proof that does not use Gaussian 
processes but only the geometry of Hilbert space. The difficulty of the problem 
is that the structure of an admissible sequence which witnesses that y2(conv 7) < 
Ly2(T) must depend on the “geometry” of the set T. A really satisfactory argument 
would give a proof that holds in Banach spaces more general than Hilbert space, 
for example, by providing a positive answer to the following, where the concept of 
q-smooth Banach space is explained in [57]: 


Research Problem 2.11.3 Consider a 2-smooth Banach space and the distance d 
induced by its norm. Is it true that for each subset T of its unit ball, one has 
y2(conv T,d) < K./log card T? More generally, is it true that for each finite subset 
T, one has y2(conv T, d) < Ky2(T,d)? (Here K may depend on the Banach space, 
but not on 7.) 


Research Problem 2.11.4 Still more generally, is it true that for a finite subset T 
of a g-smooth Banach space, one has y, (conv T) < Ky,(T)? 


Even when the Banach space is £?, I do not know the answer to these problems 
(unless p = 2!). (The Banach space £? is 2-smooth for p > 2 and q-smooth for 
p < 2, where 1/p + 1/q = 1.) One concrete case is when the set T consists of 
the first N vectors of the unit basis of €?. It is possible to show in this case that 
Yq(convT) < K(p)(log N)!/4, where 1/p+1/q = 1. We leave this as a challenge 
to the reader. The proof here is pretty much the same as for the case p = q = 2 
which was covered in Sect. 2.6. 


Exercise 2.11.5 Prove that if a > 2, we have ais +1)? < L2%. 
We recall the 7 norm || - || of (2.129). Here is a simple fact. 


Proposition 2.11.6 Consider a sequence (t,)x>1 such that 


Vk>1, |lKll < 1/Vlogk+1). 


Let T = {4t,k > 1}. Then Esup,e X1 < L and thus Esupj;cconyp Xt < L 
by (2.131). 


Proof We have 


P( sup IX,| > u) < 2 P(\X,,|>u) < a  Jogtk + )) (2.132) 


since X,, is Gaussian with EX; < 1/log(k + 1). For u > 2, the right-hand side 
of (2.132) is at most L exp(—u?/L) by the result of Exercise 2.11.5, and as usual 
the conclusion follows from (2.6). oO 


Exercise 2.11.7 Deduce Proposition 2.11.6 from (2.34). Hint: Use Exercise 2.4.1. 
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It is particularly frustrating not to be able to solve the following special instance of 
Problem 2.11.2: 


Research Problem 2.11.8 In the setting of Proposition 2.11.6, find a geometrical 
proof that y2(conv 7) < L. 


The following shows that the situation of Proposition 2.11.6 is in a sense generic: 


Theorem 2.11.9 Consider a countable set T C ¢*, with 0 € T. Then we can find a 
sequence (tx) with 


VkK>1, (tll Vlog(k + 1) < LEsup xX; 


teT 


and 
T Cconv({t, ; k > 1}). 


Furthermore, we may assume that each t, is a multiple of the difference of two 
elements of T.** 


Proof By Theorem 2.10.1, we can find an admissible sequence (.A,,) of T with 


Wee: S2"/? A(An(0)) < LEsupX;:=S. (2.133) 
n>0 teT 
We construct sets T, C T, such that each A € A, contains exactly one element of 
T,. We ensure in the construction that T = L,,>9 Tn and that To = {0}. (To do this, 
we simply enumerate the elements of T as (vy, ee 1 with vo = O, and we ensure that 
Uy, is in T,.) For n > 1, consider the set U,, that consists of all the points 


g-n/2 t—v 
It — v|| 
where t € T,,v € T,-1, andt 4 v. Thus, each element of U, has norm a 
and U, has at most N,Nn—1 < Nn41 elements. Let U = Lesa U,,. Then since 
ee Nest < Nn+2, U contains at most N,+2 elements of norm > 2-"/2. We 
enumerate U as {zx;k = 1,...} where the sequence (||z ||) is non-increasing, so 
that ||zx|| < 2-"/? for k > Ny+2. Let us now prove that ||z,|| < L/,/log(k + 1). If 
k < Ny, this holds because ||zz|| < 1. Assume then that k > No, and letn > 0 be 
the largest integer with k > N,+2. Then by definition of n, we have k < N,+3 and 
thus 2-"/? < L/,/logk. But then ||z,|| < 2-"/? < L/,/logk, proving the required 
inequality. 


24 This information is of secondary importance and will be used only much later. 
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Consider t € T, so that t € T,, for some m > 0. Writing 2, (t) for the unique 
element of 7,1 Ay(t), since mo(t) = 0, we have 


t= Yo mO-mit)= Yd an(@)un@), (2.134) 


l<n<m l<n<m 
with ay(t) = 2”/? \|t_(t) — mtn—1(t)|| and 


4 Tn(t) — Mr-1(t) 


7 = Q-n/ SN ENE 
“ im () — m1 Ol 


Since 


Yo an(t) = 52" A(An-1@) < 28 


l<n<m n>1 


and since u,(t) € U, C U, we see from (2.134) that 


t= > ay(t)un(t) + (2s => an(t)) x 0 € 2Sconv(U U {0}) . 


l<n<m l<n<m 


Thus, T Cc 2S conv(U U {0}) = conv(2SU U {0}), and it suffices to take t, = 2Sz,x. 
Oo 


Exercise 2.11.10 What is the purpose of the condition 0 € T? 


Exercise 2.11.11 Prove that if T Cc €? and 0 € T, then (even when T is not 


countable) we can find a sequence (fg) in €*, with ||t||,/log(k + 1) < LE sup;er Xt 
for all k and 


T C conv{t,; k> 1}, 
where conv denotes the closed convex hull. (Hint: Do the obvious thing — apply 
Theorem 2.11.9 to a dense countable subset of T.) Denoting now by conv*(A) the 
set of infinite sums ys aja; where ; |a;| = 1 and a; € A, prove that one can also 
achieve 


T Cconv'{t; k> 1}. 


Exercise 2.11.12 Consider a set T C €? with0 € T C B(0, 5). Prove that we can 
find a sequence (t,) in £7, with the following properties: 


Vk>1, ltl] /log(k + 1) < LEsup X; , (2.135) 
teT 
tell < LS, (2.136) 


T Cconv{tz,; k> 1}, (2.137) 
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where conv denotes the closed convex hull. Hint: Copy the proof of Theorem 2.11.9, 
observing that since T C B(0,5) one may chose A, = {7} and T, = {0} for 
n < no, where no is the smallest integer for which 2no/2 > §-1E sup;er Xr, and 
thus U, = % forn < no. 


The following problems are closely related to Problem 2.11.2: 
Research Problem 2.11.13 Give a geometric proof of the following fact: Given 


subsets (T)x<n Of a Hilbert space and T = RN Ty = {xp +... + xn; Vk < 
N, xx € Tx}, prove that y2(T) < L es y2(Th). 


We do not even know how to solve the following special case: 
Research Problem 2.11.14 Consider sets (7;,)x<w in a Hilbert space, and assume 


that each 7; consists of M vectors of length 1. Let T = Lie n Ip. Give a 
geometrical proof of the fact that y2(T) < LN./log M. 


The next exercise is inspired by the paper [5] of S Artstein. It is more elaborate 
and may be omitted on first reading. A Bernoulli r.v. ¢ is such that P(e = +1) = 
122 


Exercise 2.11.15 Consider a subset T C R"”, where R” is provided with the 
Euclidean distance. We assume that for some 6 > 0, we have 


0€TC B(0,8). (2.138) 


Consider independent Bernoulli r.v.s (€;, »)j, p>1. Given a number q < n, consider 
the operator U, : R” — RY? given by 


Ug(x) = (> ei,pXi) 


i<n 


psa 
(a) Prove that ||U,|| > /n. 
We want to prove that despite (a), there exist a number L such that if 
Esupjer ii<n Siti < 5,/@, then with high probability 
U,(T) C BO, Lb/q) , (2.139) 


whereas from (2.138) we would not expect better than U,(T) C B(O, 6./n). 
(b) Use the sub-Gaussian inequality (6.1.1) to prove that if ||x|| = 1, then 


Eexp (5( ceipxs)’) 2T,. (2.140) 


>5 One must distinguish Bernoulli r.v.s ¢; from positive numbers €,! 
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(c) Use (2.140) and independence to prove that for x € R” and v > 1, 


P(||Ug(x)|l = Lv/qilxll) < exp(—v°q) . (2.141) 


(d) Use (2.141) to prove that with probability close to 1, for each of the vectors ft, 
of Exercise 2.11.12, one has ||Uq (t,)|| < L6./q and conclude. 


We end this section with a discussion of a question which shares some features 
with Problem 2.11.2, in a sense that it is a property which is obvious, on one hand, 
but difficult to prove without using the magic of linearity.2° For k < N, let us 
consider Gaussian processes (Xx,)re7 With associated distances d;,. On the space 
T =[],<n Tk, let us consider the distance d given by 


1/2 
d( tien, (ken) = (YO deltas”) (2.142) 


k<N 


Proposition 2.11.16 We have 


y(T,d) < LY) y2(Ti, de) - (2.143) 
k<N 


Proof Assuming without loss of generality that the processes (X;)x<n are inde- 
pendent, we can consider the Gaussian process (X;);er given fort = (t,)x<n by 
xX; = een Xx,1,. It is obvious that the distance d of (2.143) is associated with this 
process. It is also obvious that 


sup X; = > sup Xz, . 
teT ken tlk 


Taking expectation and combining with (2.114) conclude the proof. oO 


The question now is to prove (2.143) without using Gaussian processes, for example, 
by proving it for any sequence ((Tx, dx))x<n Of metric spaces. The most interesting 
part of that project is that it is unexpectedly hard. Is it the sign that we are still 
missing an important ingredient? In the next exercise, we show how to prove about 
the simplest possible case of (2.143), which is already pretty challenging. 


Exercise 2.11.17 Throughout this exercise, each space 7; consists of Mx points, 
and the mutual distance of any two different points of T; is ex > 0. The goal is to 
prove the inequality 


| vice N(T, d, €)de < L > €xx/log Mt . (2.144) 


k<N 


26 There are several equally frustrating instances of this situation. 
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Throughout the exercise, J denotes a subset of {1,..., N}, and I° denotes its 
complement. 


(a) Prove that if pe, €z < €*, then N(T, d, €) < [[geye Mk. 
(b) Show that to prove (2.144), it suffices to prove the following: Consider two 
sequences (€,)x<y and (nx)x<n. Fore > 0, define S(e) by 


s(e)? = int | re eset, 
kel¢ kel 


where the infimum is over all choices of J. Then 


(oe) 
1 S(e)de <L Yo exe. (2.145) 
0 k<N 
(c) To prove (2.145), show that it suffices to prove the following: Consider a 


function h > 0 on a probability space. For € > 0, define S(e) by 


c p) : 1 2 
XO) = ine{ [ “du: [rans e 
Ach A 


where the infimum is over all choices of A. Then i S (e)de < L. Hint: Reduce 
to the case where )>,-y €xnx = 1. Use the probability w on {1,..., N} such 
that .({k}) = exnx and the function h given by h(k) = €x/nx. 

(d) Show that it suffices to prove that Veez hag mee <L. 

(e) Assuming for simplicity that 2 has no atoms,’ prove the statement given in 
(d). Hint: For 2 € Z and Qe < fhdu, consider the set Ag of the type Ag = 
{h < te} where te is such that J, hd = 2-2 so that $(2—*)? < Se A/ du. 
Warning: This is not easy. 


Exercise 2.11.18 This exercise continues the previous one. The spaces (Tx, dx) are 
now any metric spaces, and the goal is to prove that 


(oe) lo) 
/ Vlog N(T,d,e)de < L yi Vlog N(Ty, dx, €)de . (2.146) 
0 0 


k<N 


Proving this requires passing the main difficulty of proving (2.143), but to 
prove (2.143) itself, it will be convenient to use different tools, and that proof 
is the object of Exercise 3.1.6. 


27T am sure that this is true without this hypothesis, but I did not find the energy to carry out the 
details. 
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(a) Show that to prove (2.146), it suffices to prove the following: Consider 
decreasing functions f; : Rt — R™, and for € > 0, define V(e) by 


veer =int | DY Alay’; Dese’y. 


k<N k<N 


where the infimum is taken over all families (€x)x<n. Then 


[ Viejde < L > ‘i. STk(ejde . (2.147) 
0 0 


k<N 


(b) When each jf; is of the type fx = nx1jo,6,;, deduce (2.146) from (2.145). 

(c) Convince yourself that by approximation, it suffices to consider the case where 
each fy is a finite sum )~, 2~"10,,)(- 

(d) In the case (c), prove (2.147) by applying the special case (b) to the family fx,¢ 
of functions given by fi,¢ := 210.6% of for all relevant values of k, @. Hint: 
This is a bit harder. 


2.12 Dreams 


We may reformulate the inequality (2.114) 


1 
pee: d) < E sup X; < Ly2(T, d) 


teT ~ 


of Theorem 2.10.1 by the statement 
Chaining suffices to explain the size of a Gaussian process. (2.148) 


We simply mean that the “natural” chaining bound for the size of a Gaussian process 
(i.e., the right-hand side inequality in (2.114)) is of correct order, provided one uses 
the best possible chaining. This is what the left-hand side of (2.114) shows. We may 
dream of removing the word “Gaussian” in that statement. The desire to achieve this 
lofty goal in as many situations as possible motivates much of the rest of the book. 

Besides the generic chaining, we have found in Theorem 2.11.9 another optimal 
way to bound Gaussian processes: to put them into the convex hull of a “small” 
process, that is, to use the inequality 


E sup X; < Linf{S; T C conv{t,,k > 1}, |Itel] < S/Vlog(k+ D} . 
teT 


Since we do not really understand the geometry of going from a set to its convex 
hull, it is better (for the time being) to consider this method as somewhat distinct 
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from the generic chaining. Let us try to formulate it in a way which is suitable for 
generalizations. Given a countable set V of r.v.s, let us define the (possibly infinite) 
quantity 


S(Y) = inf {Ss > 0; i Y> PAV > wdu < sh. (2.149) 
VeV 


Lemma 2.12.1 It holds that 


E sup |V|<2S()). (2.150) 


Veconv V 


Proof We combine (2.6) with the fact that for S > S(V), we have 


ioe) [o.@) 
/ P( sup IVizu)du s+ f YD PUVi>wdu<28. 0 
0 S 

VeV 


Veconv V 


Thus, (2.150) provides a method to bound stochastic processes. This method 
may look childish, but for Gaussian processes, the following reformulation of 
Theorem 2.11.9 shows that it is in fact optimal: 


Theorem 2.12.2 Consider a countable set T. Consider a Gaussian process 
(X)rer, and assume that Xj, = 0 for some to € T. Then there exists a countable 
set V of Gaussian r.v.s, each of which is a multiple of the difference of two variables 
Xt, with 


VteT; X;€convy, (2.151) 
S(V) < LEsup X; . (2.152) 
teT 


To understand the need of the condition X;, = O for some fo, think of the case 
where T consists of one single point. The proof of Theorem 2.12.2 is nearly obvious 
by using (2.132) to bound S(V) for the set V consisting of the variables X;, for the 
sequence (t,) constructed in Theorem 2.11.9. We may dream of proving statements 
such as Theorem 2.12.2 for many classes of processes. 

Also worthy of detailing is another remarkable geometric consequence of The- 
orem 2.11.9 in a somewhat different direction. Consider an integer N. Considering 
iid. standard Gaussian r.v.s, we define as usual the process X; = )°; x giti. We 
may view an element ¢ of ae as a function on bee by the canonical duality, and 
therefore view f as ar.v. on the probability space (Ge iL), where ju is the law of the 
sequence (g;);<y. The processes (X;) and (t) have the same law; hence, they are 
really the same object viewed in two different ways. Consider a subset T of e., and 
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assume that T C conv{t,; k => 1}. Then for any v > 0, we have 


{ supe > v| CUtn > vy. (2.153) 
teT k>1 


The sets {t, > v} on the right are very simple: they are half-spaces. Assume now 
that for k > 1 anda certain S, we have ||f;||,/log(k + 1) < S. Then for u > 2 


2 
Yo M(t = Su}) < Y exp (— FS logk+ 1) < Lexp-w?/L) 
k>1 k>1 


the very same computation as in (2.132). Theorem 2.11.9 implies that one may find 
such t for S = LE sup, X;. Therefore for v > LE sup, X;, the fact that the set in 
the left-hand side of (2.153) is small (in the sense of probability) may be witnessed 
by the fact that this set can be covered by a union of simple sets (half-spaces), the 
sum of the probabilities of which is small. 

We may dream that something similar occurs in many other settings. In Chap. 13, 
which can be read right now, we will meet such a fundamental setting, which 
inspired the author’s lifetime favorite problem (see Sect. 13.3). 


2.13 A First Look at Ellipsoids 


We have illustrated the gap between Dudley’s bound (2.41) and the sharper 
bound (2.34), using examples (2.49) and (2.42). These examples might look 
artificial, but here we demonstrate that the gap between Dudley’s bound (2.41) and 
the generic chaining bound (2.34) already exists for ellipsoids in Hilbert space. Truly 
understanding ellipsoids will be fundamental in several subsequent questions, such 
as the matching theorems of Chap.4. A further study of ellipsoids is proposed in 
Sect. 3.2. 
Given a sequence (a;)j>1 , aj > 0, we consider the ellipsoid 


2 
t? 
= iD 
E=[ree ieee (2.154) 
i>l 1 
Proposition 2.13.1 When bee ae < 00, we have 


EZ)" <tmpx<(Fa)". ess 


i=l i=l 
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Proof The Cauchy-Schwarz inequality implies 


1/2 
= sup X; = sup ) | tigi < < (doa?8?) ; (2.156) 


teE teE js] i>] 


Taking t| = a 81 Xs a-g;)'? yields that actually Y = ());.; a7g7)'? and 
thus EY? = yaaa a;. The right-hand side of (2.155) follows from the Cauchy- 
Schwarz inequality: 


Ey < (Ey?)!/? = (doa) (2.157) 
i>l 


For the left-hand side, let o = max; |a;|. Since Y = sup,cg X; = |a;||g;| for any 
i, we have o < LEY. Also, 


t2 
EX? = Li < max ay 5 <0 ’ (2.158) 
J as 


Then (2.118) implies?® 
E(Y — EY)* < Lo? < L(EY)? 


so that )°,.., a7 = EY” = E(Y — EY)? + (EY)? < L(EY)’. o 


As a consequence of Theorem 2.10.1, 


n(é) = L(> a) (2.159) 


i>1 
This statement is purely about the geometry of ellipsoids. The proof we gave was 
rather indirect, since it involved Gaussian processes. Later on, in Chap. 4, we will 
learn how to give “purely geometric” proofs of similar statements that will have 
many consequences. 
Let us now assume that the sequence (a;)j>1 is non-increasing. Since 


oh < 1 < grt > ajn = aj = Agn+1 


we get 


Vin ee a 2 n,2 
Lae=d, 2, as) rap 


i=l n=O 2n<j<2"+1 n=0 


28 One may extend (2.118) to the case where U is infinite by a proper definition of sup,cy Xt. 
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and 


Date Drab =F DM 


i>1 n>0 n>1 
and thus are, aa, <3 Ss a So we may rewrite (2.155) as 


=( Fe)” < Esup X; < (ye) (2.160) 


n>0 teE n=0 


Proposition 2.13.1 describes the size of ellipsoids with respect to Gaussian 
processes. Our next result describes their size with respect to Dudley’s entropy 
bound (2.38). 


Proposition 2.13.2 We have 


1 
. pa ae ea a eae (2.161) 


n>0 n>0 n=0 


The right-hand sides in (2.160) and (2.161) are distinctly different.”? Dudley’s 
bound fails to describe the behavior of Gaussian processes on ellipsoids. This is a 
simple occurrence of a general phenomenon. In some sense, an ellipsoid is smaller 
than what one would predict just by looking at its entropy numbers e, (€). This idea 
will be investigated further in Sect. 4.1. 


Exercise 2.13.3 Prove that for an ellipsoid € of R”, one has 


S27 en(E) < Ly/logim + Dy2(E, d) , 


n>0 


and that this estimate is essentially optimal. Compare with (2.58). 


The proof of (2.161) hinges on ideas which are at least 50 years old and which 
relate to the methods of Exercise 2.5.9. The left-hand side is the easier part (it is 
also the most important for us). It follows from the next lemma, the proof of which 
is basically a special case of (2.45). 


Lemma 2.13.4 We have e,(E) => say. 


Proof Consider the following ellipsoid in R?": 


2 


& = {(tidicor ; pa a Ss 1 , 


i<an Gi 


2° This difference may seem rather small, but, as we shall see in Chap. 4, there are natural situations 
where it really matters. 
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Since E, is the image of € by a contraction*°(namely, the “projection on the first 2” 
coordinates”’), it holds that e,(E,) < en(E). 

Throughout the rest of this section, we denote by B the centered unit Euclidean 
ball of R2” and by Vol the volume in this space. Let us consider a subset T of €,,, 
with card T < 22" ande > 0; then 


Vol ( eB + 1)) < )\Vol(eB +1) < 2" e?"VolB = (2€)”" VolB . 


teT teT 


Since we have assumed that the sequence (q;) is non-increasing, we have aj > azn 
fori < 2” andthus a2 B C E,, so that Vol€, > ros VolB. Thus, whenever 2€ < an, 
we cannot have €, C U,-r(€B +1), so that en (En) > azn /2. o 


We now turn to the upper bound, which relies on a special case of (2.46). We keep 
the notation of the proof of Lemma 2.13.4. 


Lemma 2.13.5 We have 
ens 3(E) < €ng3(En) + amr (2.162) 


Proof We observe that when t € €, then, using that aj < ao» fori > 2” in the last 
inequality, 


ih fs 1 ‘ 
i ae ee 
a1 4% isan Ge 0 San 


so that ()0;.9n ey? < ann and, viewing €, as a subset of €, we have d(t, En) < 
azn. Thus, if fork > 1 we cover €, by Nx balls of radius ¢€, the balls with the 
same centers but radius € + az» cover €. This proves that ex (E,) < ex(E) + aan and 


hence (2.162). El 
Lemma 2.13.6 Let € = maxy<n axa. Consider a subset A of Ey, with the 
following property: 

Any two points of A are at mutual distance > 2€ . (2.163) 


Then card A < Ny+3. 


Proof The balls centered at the points of A, with radius €, have disjoint interiors, so 
that the volume of their union is card A Vol(€ B), and since these balls are entirely 
contained in €, + €B, we have 


card A Vol(eB) < Vol(E&, +€B). (2.164) 


30 Generally speaking, a map y from a metric space (T, d) to a metric space (T’, d’) is called a 
contraction if d’(y(x), g(y)) < d(x, y). 
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For t = (tj)j<2n € En, we have Yejeon P/ar < 1, and for ¢’ in €B, we have 
Vicon t!? Je? < 1. Let c; = 2 max(e, a;). Since 


(tj + 1)? i532 + 21/7 lye 
ya <P i(Seh)<1, 
=< s 7 TIS 
i<2n CG i<2” & i<2n 2 a; . 
we have 
t2 
ExteB celia {rs 5<il 
Cc. 
i<2” 1 
Therefore 


Vol(E, + €B) < VolE' = VolB | | i 


i<2n 
and comparing with (2.164) yields 
card A < Il 2 inal I] max (1, “) ; 
mae / € 
i<2” i<2” 


Next it follows from the choice of € that for any k < n, we have ayx Plame < e. Then 
Gp ae < e2" for 2* <i <2", so that 


I] max (1, =)= I] I] max (1. =) 
i<2” k<n—1 2k <i<2ktl 
< |] (20) = DDken-1-W2* < 92"? 
k<n-1 
since ae, i2-' = 4. Therefore, card A < 22". go" < Nn+3- oO 


Lemma 2.13.7 We have 


€n3(En) < 2max(aye2*) . (2.165) 
k<n 


Proof Assume now that A is as large as possible under (2.163). Then the balls 
centered at points of A and with radius < 2¢€ cover €,, for otherwise we could add 
a point to A. Since card A < Nyn+3, we have en43(En) < 2e. oO 
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Combining (2.165) with (2.162), we obtain 
Corollary 2.13.8 We have 


en43(E) < 3max(aye2*) . (2.166) 
k<n 


Proof of Proposition 2.13.2 We have, using (2.166), 


2/26, (€) = 3 2M+3)/26, 1, (€) <L > ay > 2"ay) 


n>3 n>0 n>0 k<n 
= LT Hay We? < LH Pay 
k>0 n>k k>0 


Since € is contained in the ball centered at the origin with radius a;, we have 
én(E) < a, for each n. The result follows. oO 


2.14 Rolling Up Our Sleeves: Chaining on Ellipsoids 


Let us recall the ellipsoid € of (2.154). We have proved (2.159) as a consequence 
of the majorizing measure theorem, Theorem 2.10.1. We will later give a more 
geometrical proof of this result. In the present section, we demonstrate the hard way 
that these results are deep, by explicitly constructing a chaining on the ellipsoid 
€.*! This is surprisingly non-trivial.** Let us assume that the sequence (a;) is non- 
increasing, and for n > 0, let J, = {i; 2” <i < 2”+1) so that card J, = 2” and 


1 Qn 
aan 2s y y 2 _ 2. 2 
Eee = fer; oe #sif={rel; —) pif. 
n>0 ° 2” jel, n>0 icl, 


(2.167) 


where cy = Dass Furthermore (as in the previous section), Deal Ce = 


neo 2, <3 vist ae For such an ellipsoid €’, we will construct sets U, C €? 
with card U;, < Nn+no (where no is a universal constant) such that 


wee’, ¥27d¢,Up2 L( Yoon)" (2.168) 


n>0 n>0 


3! There are obvious similarities between this section and Sect. 2.6. It is a good challenge to figure 
out by yourself how to do the chaining on ellipsoids after having studied Sect. 2.6. 


3? T am grateful to Dali Liu for having suggested to include this section. 
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Let us now deduce from this result how to perform the chaining on the ellipsoid € 
of (2.154). As we have just seen, such an ellipsoid is contained in an ellipsoid €’ 
of the type (2.167) for which 7.9 cn < L 0), 47. Consider the sets Un C 0? 
as in (2.168). Consider a map g : @7 — € such that d(x, y(x)) < 2d(x,€), 
and observe that fort € € and x e€ @? we have d(x,g(x)) < 2d(x,t) so 
that d(t, p(x)) < d(t,x) + d(x, g(x)) < 3d(t, x). Consequently, d(t, g(Un)) < 
3d(t, Un). The sets p(U,) C E satisfy card g(U,) < card Uy < Nniny. We define 
Ty, = {0} forn < no and T, = Un—ny forn > no. Thus, card T, < Ny and (2.168) 
implies 


wteé, 5. 2"d(t,T) < L( oa?) 


n>0 i 


We now prepare for the construction of the sets U;,. There is no loss of generality 
to assume that }°,..9 Cn = 1. 


Lemma 2.14.1 Given t € €’, we can find a sequence (p(n, t))n>o0 of integers with 
the following properties: 


pes ee (2.169) 
iE€l, 
oe ene ah (2.170) 
n>0 
Yn >0, p(n+1,t) < p(n,t) +2. (2.171) 


Proof Define q(n, t) as the largest integer g < 2n such that 7. i: ie < 2~4. Let 
A = {n > 0;q(n,t) < 2n}, so that forn € A by definition of g(n, t) we have 


2-4@-) <2) 7 #7. Thus, since t € €’, 
gn—-q(n,t) gn > 
<2 == te: 2. 
» Cn = 2 Cn Ss a 
neA n>0 ieln 


Since >, Cn = 1, then 17-4 27/2-9-/? < L by the Cauchy-Schwarz inequality. 
Since 2”/2-4@.)/2 — 2-"/? forn ¢ A, we have » ase 2n/2—g(n.0/2 < L. We define 
now p(n, t) = min{q(k, t)+2(n—k); 0 < k <n} so that (2.171) holds. Also, since 
Q-POD/2 < S21 D/2-O—-®), we obtain 


> Ne > oy 9k /2-4 kt) /29—(n—h) /2 


n>0 n>0 k<n 


= ee OP aa <L.o 
k>0 n>k 
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For each n > 1 and p > O, consider the set B(n, p) C €? which consists 
of the f = (f;);>1 such that t; = 0 if i > 2” and |[tll2 < 27?/**?. This is a 
ball of dimension 2” — 1 and radius 2~?/?*?. Using (2.47) for € = 1/4, there is 
aset V,» C Bin, p) with card Vy» < L?" such that every point of B(n, p) is 
within distance < 2~?/2 of Vn,p. We consider the set V,; = Up<p<2nVn,p so that 
card Vip < L?". We set Uo = {0}, and we consider the sets U;,, consisting of the 
elements x9 + ...+ x, where xz € Vz 

For t € 7 andn > 0, we define 1 € €? by 4 = 1; if i < 2" and 1” = 0 if 


i > 2”, Note that = ¢ fort € Up. 


Lemma 2.14.2 Fort € E’, consider the sequence (p(n, t)) of Lemma 2.14.1. Then 
for each n, we can find u(n) € U;, such that d(u(n), tD) < 27 P@)/2, 


Proof The proof is by induction over n. For n = 0, it suffices to take u(0) = 0 since 
t©) — 0, For the induction step from n to n + 1, we have t) = u(n) + v(n) where 
u(n) € Uy, and |lv(n)|l2 < 2-7/2, so that t+) = u(n) + v’(n) where v/(n) = 
v(n) + t@FD — 4, By (2.169) 2@4D — 1]. = (Diez ye = oP, 
Thus, ||v/(n)||z < 27? ™D/AF! < 2-P@+1.0/24+2 where we have used (2.171) in the 
second inequality. Since v(m); = 0 fori > 2”, we have v'(n); = 0 fori > 2”*! so 
that v'(n) € Brn + 1, p(n + 1, 1)). Thus, there is an element w € Va+l,p(ntlt) C 
V,41 for which ||v’(n) — wlz < 27? +)”, Setting u(n+ 1) := u(n)+w € Unit, 
we then have t+) — y(n +1) = v'(n) — w. o 


Corollary 2.14.3 For t € €' we have). 2"/*d(t, Un) < L. 
Proof Recalling (2.169), we have 


I-21 = doe spare 


k>nielk k>n 


so that ||f —¢ |l2 < 2-7 % 9. Then 


Pr -_ t IIo < pO aie. = ype - gn/2 


n>0 n>Ok>n k>1 O<n<k 


21 ore eee 
k>1 


using (2.170) in the last inequality. Since d(t, Un) < d(t™, Un) + |lt — t™ |l2, the 
result follows, using also Lemma 2.14.2. Oo 
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2.15 Continuity of Gaussian Processes 


By far, the most important result concerning continuity of Gaussian processes is 
Dudley’s bound (1.19). However since the finiteness of the right-hand side of (1.19) 
is not necessary for the Gaussian process to be continuous, there are situations 
where this bound is not appropriate.» In the present section, we show that a suitable 
form of the generic chaining allows us to capture the exact modulus of continuity 
of a Gaussian process with respect to its canonical distance in full generality. 
Not surprisingly, the modulus of continuity is closely related to the rate at which 
the series )°, or 2A(An (t)) converges uniformly on T for a suitable admissible 
sequence (.A,,). Our first result shows how to obtain a modulus of continuity using 
the generic chaining. 


Lemma 2.15.1 Consider a metric space (T,d) and a process (X;)ter which 
satisfies the increment condition (2.4): 


uz 


Assume that there exists a sequence (T,) of subsets of T with cardT, < Ny such 
that for a certain integer m and a certain number B one has 


sup .s OG To) eB (2.172) 


teT n>m 
Consider 8 > 0. Then, for any u > 1, with probability > 1 — exp(—u?2”), we have 
Vs,t€T , ds, 35 => |X, — Xj < LuQ"3 4B). (2.173) 


Proof We assume T finite for simplicity. For n > m andt € T, denote by z,(t) an 
element of 7,, such that d(t, 2, (t)) = d(t, T,,). Consider the event §2, defined by** 


Vn=m+1,VteT, |Xn_y) — Xml < Lu2”/?d(ty_-1(t), Ta (t)) 5 
(2.174) 
and 


¥e.¢ € Tas [Xe = Xe SL" ae’) (2.175) 


33 In practice, however, as of today, the Gaussian processes for which continuity is important can 
be handled through Dudley’s bound, while for those which cannot be handled through this bound 
(such as in Chap. 4), it is boundedness which matters. For this reason, the considerations of the 
present section are of purely theoretical interest and may be skipped at first reading. 


34 We are again following the general method outlined at the end of Sect. 2.4. 
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Then as in Sect. 2.4, we have P(Q,) > 1 — exp(—u?2”"), Now, when 2, occurs, 
using chaining as usual and (2.174), we get 


Wel: (Gok, wl Sie, (2.176) 


Moreover, using (2.172) in the inequality, d(t, tm(t)) = d(t, Tm) < B27"? , so 
that, using (2.175), 
d(s,t) <8 > d(atm(s), 1m(t)) <6 +2B2-" 
=> IXrn(s) ~~ Xa (t)| s Lu(62”/? + B) . 
Combining with (2.175) proves that |X, — X;| < Lu(62”/? + B) and completes the 
proof. oO 
Exercise 2.15.2 Deduce Dudley’s bound (1.19) from Lemma 2.15.1. 


We now turn to our main result, which exactly describes the modulus of 
continuity of a Gaussian process in terms of certain admissible sequences. It implies 
in particular the remarkable fact (discovered by X. Fernique) that for Gaussian 
processes the “local modulus of continuity” (as in (2.177)) is also “global”. 


Theorem 2.15.3 There exists a constant L* with the following property. Consider 
a Gaussian process (X1)teT, with canonical associated distance d given by (0.1). 
Assume that S = Esup,e7 X1 < 0. Fork > 1, consider 8% > 0, and assume that 


WreT:E sup epee Oe an (2.177) 
{seT;d(s,t)<dx} 


Let no = 0, and for k => 1, consider an integer nx for which 
LS * 28, (2.178) 
Then we can find an admissible sequence (An) of partitions of T such that 


Vk>0; sup > MEAG) 2 Ls. (2.179) 


teT n>nk 


Conversely, given integers nx and an admissible sequence (An) as in (2.179) and 
defining now 6; = $2-"/2-k. with probability > 1 — exp(—u2"*), we have 


sup HRS is. (2.180) 
{s,teT;d(s,t)<oz)} 


The abstract formulation here might make it hard at first to feel the power of 
the statement. The numbers 6, control the (local) modulus of continuity of the 
process. The numbers n,; control the uniform convergence (over t) of the series 
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ae, 2"/? A(An(t)). They relate to each other by the relation 5, ~ S27"*/-*, The 
second part of the theorem asserts that in turn the numbers 7, control the uniform 
modulus of continuity (2.180). 


Proof According to the majorizing measure theorem and specifically (2.114), there 
exists a constant L* such that for each subset U of T there exists an admissible 
sequence (A,) of partitions of U such that 


L* 
wee U, 92"? A(An(t)) < —E sup Xs . (2.181) 
2 seU 


n>0 


Assuming (2.177), by induction over k, we construct an admissible sequence 
(An)n<n, Such that 


l<p<k=>sup > PPAA,G) aL st? , (2.182) 
teT 


Np-1<NSNp 

For k = 1, the existence of the sequence (Ay)n<n, follows from the Majorizing 
Measure Theorem through (2.181) as explained, so we turn to the induction step 
from k to k + 1. Using (2.182) for p = k, we deduce that for each t € T, 
2"! A(An, (t)) < L*S2-*, so that A(An,(t)) < L*S27"/?-* < 6; using (2.178) 
in the last inequality. Consequently, for any element C of An,, we have A(C) < 5x, 
so that considering any point t of C we have, using (2.177) in the last inequality, 


Esup X; = Esup(Xs; — X1) < E sup t= Sie se, 
sec sec {seT;d(s,t)<dx} 


Using the majorizing measure theorem, we construct for each C € Ap, an 
admissible sequence (Ac.n)n>0 of partitions of C for which 


V¥rec, So 2"? A(Acn(t)) 2. (2.183) 


n>0 


For ng <n < ng41, we simply define A, as the collection of all sets in one of the 
partitions Ac»; where C € Apn,, so that card Ay < Nn—1 card Ay, < N2, < Np. 
Since for tf € C we have A,(t) = Ac.n—1(t), it follows from (2.183) that for any 
C € An,, we have 


sup > 2"/? A(An(t)) < sup > OP! A(Ac »-1(t)) = L*S2* , 


tEC ny <n<ngat tEC sng 


This completes the induction and the construction of the sequence (A,) 
since (2.182) implies (2.179). 
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It remains to prove the “conversely” part. For this for each n > 0, we simply 
consider a subset 7,, of T such that 


VAEA,, card(T,N A)=1. 


For each k, we then use Lemma 2.15.1 for m = ng and B = S2-*, oO 


Key Ideas to Remember» 


e The generic chaining efficiently organizes the standard chaining argument for 
processes whose increments have Gaussian-like tails governed by a distance d as 
in (2.4). 

¢ The generic chaining applied to such processes motivates the introduction of our 
main measure 72(7, d) of the size of a metric space (T, d). This measure involves 
the existence of suitable sequences of partitions. 

e The fundamental problem then becomes how to construct such sequences of 
partitions in a metric space. 

e There is a machine (called a partitioning scheme) to construct such sequences of 
partitions. The input to the machine is a functional, a function of the subsets of 
our basic metric space, which in a sense is a measure of their size. The existence 
of such functionals with specific growth properties is intrinsically linked to the 
existence of such sequences of partitions. 

¢ The majorizing measure theorem is the statement that for a Gaussian process 
with index set T and canonical distance d the quantity E sup,-<7 X; is exactly of 
order y2(T, d). The proof relies on a partitioning scheme, used for the functional 
F(A) = Esup,.,4 X;. Sudakov minoration and concentration of measure are the 
main tools to prove that this functional satisfies the required growth condition. 

¢ Gaussian processes can be seen as subsets of a standard Hilbert space, but the 
geometric understanding that would relate the size of a set with the size of its 
convex hull is still lacking. 

¢ The traditional way to organize chaining uses entropy numbers. Even for sets as 
basic as ellipsoids in Hilbert space, entropy numbers provide only a suboptimal 
description of their size. 


2.16 Notes and Comments 


I have heard people saying that the problem of characterizing continuity and 
boundedness of Gaussian processes goes back (at least implicitly) to Kolmogorov. 
The understanding of Gaussian processes was long delayed by the fact that in 


35 The function of this brief summary is not to explain the material again, but is a way for the 
reader to check that she did understand the main ideas. If any of the points made below is not clear 
to the reader, she may not be ready to proceed and may want to review the corresponding material. 
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the most immediate examples the index set is a subset of R or R” and that the 
temptation to use the special structure of this index set is nearly irresistible. Probably 
the single most important conceptual progress about Gaussian processes was the 
realization, in the late 1960s, that the boundedness of a (centered) Gaussian process 
is determined by the structure of the metric space (T,d), where d is the usual 
distance d(s, t) = (E(X, — X. 1)*)!/ ? It is difficult now to realize what a tremendous 
jump in understanding this was, since this seems so obvious a posteriori. 

In 1967, R. Dudley obtained the inequality (2.38). (As he pointed out, R. Dudley 
did not state (2.38) though he performed all the essential steps and (2.38) totally 
deserves to be called Dudley’s bound.) A few years later, X. Fernique proved that 
in the “stationary case”, Dudley’s inequality can be reversed [32], i.e., he proved in 
that case the lower bound of Theorem 2.10.1. This historically important result was 
central to the work of Marcus and Pisier [61, 62] who built on it to solve all the 
classical problems on random Fourier series. Some of their results will be presented 
in Chap. 7. Interestingly, now that the right approach has been found, the proof of 
Fernique’s result is not really easier than that of Theorem 2.10.1. 

Another major contribution of Fernique (building on earlier ideas of C. Preston) 
was an improvement of Dudley’s bound based on a new tool called majorizing 
measures (which we will study in Sect. 3.1.3). Fernique conjectured that his bound 
was essentially optimal. Gilles Pisier suggested in 1983 that I should work on this 
conjecture. In my first attempt, I proved fast that Fernique’s conjecture held in the 
case where the metric space (7, d) is ultrametric. I learned that Fernique had already 
done this, so I was discouraged for a while. In the second attempt, I tried to decide 
whether a majorizing measure existed on ellipsoids. I had the hope that some simple 
density with respect to the volume measure would work. It was difficult to form any 
intuition, and I struggled in the dark for months. At some point, I tried a combination 
of suitable point masses and easily found a direct construction of the majorizing 
measure on ellipsoids. This made it believable that Fernique’s conjecture was true, 
but I still tried to disprove it. Then I realized that I did not understand why a direct 
approach using a partitioning scheme should fail, while this understanding should 
be useful to construct a counterexample. Once I tried this direct approach, it was a 
matter of 3 days to prove Fernique’s conjecture. Gilles Pisier made two comments 
about this discovery. The first one was “you are lucky”, by which he meant that I 
was lucky that Fernique’s conjecture was true, since a counter example would have 
been of limited interest. I am grateful to this day for his second comment: “I wish I 
had proved this myself, but Iam very glad you did it”. 

Fernique’s concept of majorizing measures is difficult to grasp and was dismissed 
by the main body of probabilists as a mere curiosity. (I myself found it very difficult 
to understand.) This could be the main reason why Fernique’s pathbreaking work 
did not receive the recognition it should have. I have tried to repair this and to 
express my personal admiration by dedicating this book to his memory and by 
paying homage to his work at numerous places in this book. 

In 2000, while discussing one of the open problems of this book with Keith Ball 
(be he blessed for his interest in it!), I discovered that one could replace majorizing 
measures by the totally natural variation on the usual chaining arguments that was 
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presented here. That this was not discovered much earlier is a striking illustration 
of the inefficiency of my brain. For two decades, it looked like majorizing measures 
would not be of any use anymore, but they now play again a major role again for 
reasons to be explained in Chap. 5. 

In [111], the author presented a particularly simple proof of Theorem 2.10.1 
(expressed in terms of majorizing measures since the generic chaining had not been 
invented yet). It is based on a partition scheme related to the one we use here. The 
precise relationship is discussed on page 72 of [132]. 

It is on purpose that I did not stress Slepian’s lemma, which is the statement 
that (2.127) holds for L = 1. This lemma is very specific to Gaussian processes, 
and focusing on it seems a good way to guarantee that one will never move beyond 
these. One notable progress I made was to discover (ages ago) the scheme of proof 
of Proposition 2.10.8 that dispenses with Slepian’s lemma and that we shall use in 
many situations. Comparison results such as Slepian’s lemma are not at the root of 
results such as the majorizing measure theorem, but rather are (at least qualitatively) 
a consequence of them as in Corollary 2.10.12. This being said, Slepian’s lemma is 
historically very important as it crystallizes the link between E sup,<7 X; and the 
structure of the metric space (T, d). 


Chapter 3 M®) 
Trees and Other Measures of Size Ghost for 


In this chapter, we systematically investigate different ways to measure the size of a 
metric space. One of them, Fernique’s functional of Sect. 3.3 will play a major role 
in the sequel, as it is the form which lends itself to vast generalizations. The concept 
of a tree presented in Sect. 3.1 is historically important: the author discovered many 
of the results he presents while thinking in terms of trees. We know now how to 
present these results and their proofs without ever mentioning trees, arguably in a 
more elegant fashion, so that trees are not used explicitly elsewhere in this book. 
However, it might be too early to dismiss this concept, at least as an instrument of 
discovery. 


3.1 Trees 


We shall describe different ways to measure the size of a metric space and show that 
they are all equivalent to the functional y2(T, d).! 

In a nutshell, a tree is a certain structure that requires a “lot of space” to be 
constructed, so that a metric space needs to be large in order to contain large trees. At 
the simplest level, it already takes some space to construct ina set A sets B},..., By 
which are appropriately separated from each other. This is even more so if the sets 
B,..., By are themselves large (e.g., because they contain many sets far from each 
other). Trees are a proper formulation of the iteration of this idea. The basic use of 
trees is to measure the size of a metric space by the size of the largest tree (of a 
certain type) which it contains. Different types of trees yield different measures of 
size. 


'Tt is possible to consider more general notions corresponding to other functionals considered in 
the book, but for simplicity we consider only the case of y2. 
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A tree T of a metric space (T, d) is a finite collection of non-empty subsets of T 
with the following two properties: 


Given A, Bin7, ifANB+#Q, then eitherA C Bor else BCA. (3.1) 
T has a largest element . (3.2) 


The important condition here is (3.1), and (3.2) is just for convenience. 
If A,B €7 and BCA, BFA, we say that B is a child of A if 


CeT,BCCCASCH=BorcC=A. (3.3) 


We denote by c(A) the number of children of A. Since our trees are finite, some 
of their sets will have no children. It is convenient to “shrink these sets to a single 
point”, so we will consider only trees with the following property: 


If A € 7 and c(A) = 0, then A contains exactly one point . (3.4) 


A fundamental property of trees is as follows: consider trees 71,..., Tm, and for 
1 < £ <™m, let A¢ be the largest element of 7. Assume that the sets Ag are disjoint, 
and consider a set A with L),~,, Ae C A C T. Then the collection of subsets of 
T consisting of A and of LJ, es TJ; is a tree. The proof is straightforward. This fact 
allows one to construct iteratively more and more complicated (and larger) trees. 

An important structure in a tree is a branch. A sequence Ag, Aj,..., Ag iS a 
branch if Agi; is a child of Ag and if moreover Ag is the largest element of 7 
while A; has no child. Then by (3.4), the set Az is reduced to a single point ¢, and 
Ago, .--, Ag are exactly those elements of 7 which contain t. So in order to describe 
the branches of 7, it is convenient to introduce the set 


Sy ={teT; {theT}, (3.5) 
which we call the support of 7. If a set A in a tree has no child, one may call it a 
leaf. Thus, a leaf of a tree is reduced to one single point, and the support of a tree is 


the union of its leaves. By considering all the collections {A € 7; t € A} ast varies 
in S7, we obtain all the branches of 7. 


3.1.1 Separated Trees 


We now quantify our desired property that the children of a given set should be far 
from each other in an appropriate sense. A separated tree is a tree J such that to 
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each A in J with c(A) > 1 is associated an integer s(A) € Z with the following 
properties: First, 


If By and Bz are distinct children of A, then d(By, Bo) > 4°) . (3.6) 


Here, d(B,, Bz) = inf{d(x1, x2); x1 € B,,x2 € Bo}. We observe that in (3.6), 
we make no restriction on the diameter of the children of A, see Fig3.1. (Such 
restrictions will, however, occur in the other notion of tree that we consider later.) 
Second, to rule out pathologies, we will also make the following purely technical 
assumption: 


If B is achild of A, then s(B) > s(A). (3.7) 


An example of separated tree is shown on Fig.3.1. To measure the size of a 
separated tree 7, we introduce its depth, i.e., 


ae —s(A) 
p(T) := int Yo 4° Slog (A) . (3.8) 
teAeT 


Here and below, we make the convention that the summation does not include the 
term A = {t} (for which c(A) = 0). The quantity (3.8) takes into account both the 
separation between the children of A (through the term 4~°‘4) and their number 


Ao 


Ay Ao 


Fig. 3.1 A separated tree. The children of Ap are A; and B. The children of A; are Az and C. 
Ao, Aj, Az, and A3 form a branch, of which A3 is a leaf 
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(through the term ,/log c(A)). This will be a common feature of all our notions of 
sizes of trees. 
We observe that in (3.8), we have the infimum over t € Sy. In words, 


A tree is large if it is large along every branch. 
We can then measure the size of T by 


sup{o(7) ; 7 separated tree Cc T}. (3.9) 


3.1.2 Organized Trees 


The notion of separated tree we just considered is but one of many possible notions 
of trees, and it does not seem fundamental. Rather, the quantity (3.9) is used as 
a convenient intermediate technical step to prove the equivalence of several more 
important quantities. Let us now consider another notion of trees, which is more 
restrictive (and apparently much more important). An organized tree is a tree 7 
such that to each A € 7 with c(A) > 1 is associated an integer j = j(A) € Z and 
points t],..., f-(4) With the properties that 


l<f<f<c(A) oe" <dui) <4” (3.10) 
and that each ball B(te, 47d —2) contains exactly one child of A. In some sense, 
4—J(A) tells you at which scale the children of A live. Please note that it may happen 
that 4-/‘) is much smaller than A(A). An example of organized tree is drawn in 
Fig. 3.2. 


If B, and B> are distinct children of A in an organized tree, then 


d(B, Bx) > 414? , (3.11) 


Ai ie 


Ao 


Fig. 3.2 An organized tree. Here j(A2) > j(A1) 
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so that an organized tree is also a separated tree, with s(A) = j(A) + 2, but the 
notion of organized tree is more restrictive. (For example, we have no control over 
the diameter of the children of A in a separated tree.) 

We define the depth t(7) of an organized tree by 


(7) := inf Y> 4/4) flog c(A) . (3.12) 
teST 


teAeT 


Another way to measure the size of T is then 
sup{t(7) ; 7 organized tree C T}. (3.13) 


If we simply view an organized tree 7 as a separated tree using (3.11), then 
p(T) = t(T)/16 (where p(T) is the depth of 7 as a separated tree). Thus, we have 
shown the following: 


Proposition 3.1.1 We have 
sup{t(7) ; 7 organized tree} < 16sup{p(T7) ; 7 separated tree} . (3.14) 


The next result provides the fundamental connection between trees and the func- 
tional y2. 


Proposition 3.1.2 We have 
y2(T,d) < Lsup{t(7) ; 7 organized tree} . (3.15) 
Proof We consider the functional 
F(A) = sup{t(7); 7 CA, T organized tree} , 


where we write 7 C A asashorthand for “VB € 7, BC A”. 

Next we prove that this functional satisfies the growth condition (2.77) forr = 16 
whenever a is of the type 16~/, for c* = 1/L. For this, considern > 1 andm = Nj. 
Consider j € Zand t),..., tm € T with 


1<0<0<m3> 167 <d(t, tv) <2-16-4+'. (3.16) 


Consider sets He C B(te, 2- 16-/-!) anda < ming<, F (He). Consider, for £ < 
m, an organized tree Tg C Hy with t(7¢) > a, and denote by Az its largest element. 
Next we claim that the tree 7 consisting of C = ,~,,, He (its largest element) and 
the union of the trees 7; , € < m, is organized, with j(C) = 27 —land Aj,..., Am 
as children of C (so that c(C) = m). To see this, we observe that since 4-/(©O-! = 
16-/ we have 4-J0©O©-!_ < d(te, tr) < 2-16-/t! < 4-J©)+2. go that (3.10) holds 
for C. Furthermore, Ag C Hy C B(te,2-16-/—!) C B(te, 4-4), so that this 
ball contains exactly one child of C. Other conditions follow from the fact that the 
trees J; are themselves organized. Moreover, St = Up-,, ST- 
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Consider t € Sy, and let € witht € S7,. Thent € C € T7,andalsot € Ac T 
whenever t € A € 7;. Thus, using also in the second line that j(C) = 27 — 1 and 
that c(C) = m, we obtain 


y- 4-4) flog e(Ay > 4-F© log e(C) + » 4-J‘4) Jog c(A) 


teAcT teAETe 
‘ 1 ; 
>4-16//logm + t(Te) = oe +a. 


Since q@ is arbitrary, we have proved that 


1 ; 
— 16 ~J9n/2 ‘ 
F( U Hc) > (7) > 16/2"? + min FH) 


l<m 


This completes the proof of the growth condition (2.77). 

If one examines the proof of Theorem 2.9.1, one observes that it requires only 
the growth condition (2.77) to hold true when a is of the type r~/, and we have just 
proved that this is the case (for r = 16), so that from (2.81) we have proved that 
yo(T, d) < L(F(T) + A(T)). It remains only to prove that A(T) < LF(T). For 
this, we simply note that ifs, t € T, and jo is the largest integer with 4-Jo > d(s,t), 
then the tree 7 consisting of T, {t}, {s}, is organized with j(T) = jo and c(T) = 2, 
so that F(T) > t(T) > 4-/,/log2 and 4-0 < LF(T). Oo 


3.1.3 Majorizing Measures 


For a probability measure jz on a metric space (7, d), with countable support,” we 
define for each t € T the quantity 


or) 1 A(T) 1 
I,(t) = [ log nB.<) a = [ log FOTOS) a ; (3.17) 


The second equality follows from the fact that ~(B(t, €)) = 1 when B(t, €) = T, 
so that then the integrand is 0. It is important to master the mechanism at play in the 
following elementary exercise: 


2 We assume jz with countable support because we do not need a more general setting. The 
advantage of this hypothesis is that there are no measurability problems. 
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Exercise 3.1.3 Consider a number A > O and non-increasing function f 
[0, A] > R®. Define €97 = A and forn > 1 define e, = inf{e > 0; f(e) < 2"}. 
Prove that 


1 A 
; » las / fede <2) 2"e, . 
0 


n>1 n>0 


Proposition 3.1.4 Given a metric space (T,d), we can find on T a probability 
measure {1, supported by a countable subset of T and such that 


[o-e) 
1 
sup J,,(¢) = su i lie — de 2 Pd); (3.18) 
eo serge: Ve 


Any probability measure* jz on (T, d) is called a majorizing measure. The reason 
for this somewhat unsatisfactory name is that Xavier Fernique proved that for a 
Gaussian process and a probability measure jz on T, one has 


Esup X; < Lsup/,(¢) , (3.19) 
teT teT 


so that jz can be used to “majorize” the process (X;);er.’ This was a major advance 
over Dudley’s bound. The (in)famous theory of majorizing measures used the 
quantity 


inf sup /,, (t) (3.20) 
MH teT 


as a measure of the size of the metric space (T, d), where the infimum is over all 
choices of the probability measure jz. This method is technically quite challenging. 
We are going to prove that the quantity (3.20) is equivalent to y2(T, d). A related 
idea which is still very useful is explained in Sect. 3.3. 


Proof Consider an admissible sequence (A,,) with 


WweT, ) (27 A(An@) < 272(T, d) . 


n>0 


Let us now pick a point t,,4 in each set A € Ay, for eachn > 0. Since card A, < 
Nn, for each n, there are at most N, points of the type t,,4. Attributing a mass 
1/(2”N,,) to each of them, we obtain a total mass < 1. Thus, there is a probability 
measure jz on 7, supported by a countable set and satisfying W({tr,a}) = 1/(2" Nn) 


3 To avoid technicalities, one may assume that jz has countable support. 
4 One typically uses the name only when such the right-hand side of (3.19) is usefully small. 


94 3 Trees and Other Measures of Size 


for each n > O and each A € A,. Then, 


1 1 
Vn >1,VAEAn, w(A) > UA tn, a) = N, = we 
so that givent € T andn > 1, 
€ > A(An(t)) > uU(BUt, €)) = = 
Ny 
i n/2+1 
log ————— < 2 : (3.21) 
H(B(t, €)) 
Now, since j is a probability, w(B(t,€)) = 1 fore > A(T), and then 


log(1/u(BCt, €))) = 0. Thus 


Co 1 A(An(t)) 1 
I,(t) = : log —————_de = yl log —————de 
0 (Bt, €)) 10 1 A(Ansi(0)) (Bt, €)) 


2 > 2 OTA O) 3 La, @) 


n>1 


using (3.21). oO 


Proposition 3.1.5 [f jz is a probability measure on T (supported by a countable 
set) and JT is a separated tree on T, then 


p(T) < Lsup I, (t) . 
teT 


Combining with (3.14), (3.15), and (3.18), this completes the proof that the four 
“measures of the size of T” considered in this section, namely, (3.9), (3.13), (3.20), 
and y2(T, d) are indeed equivalent. 


Proof The basic observation is as follows: the sets 
BC 4S! NS peti dGcj<4" 


are disjoint as C varies over the children of A (as follows from (3.6)), so that one of 
them has measure < c(A)~!. 

We then proceed in the following manner, constructing recursively an appropriate 
branch of the tree. This is a typical and fundamental way to proceed when working 
with trees. We start with the largest element Ao of 7. We then select a child 
A, of Ao with w(B(A1,47540-!)) < 1/c(Ao), and a child Az of A, with 
L(B(A2, 4-SAD-h) < 1/c(A}), etc., and continue this construction as long as 
we can. It ends only when we reach a set of 7 that has no child and hence by (3.4) 
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is reduced to a single point t which we now fix. For any set A witht € A € 7, by 
construction, we have 


1 
—s(A)—1 
(Bt, 4 )<s cA) 


4-s(A)-1 
1 
AWA-2 Jae et A\ < | a 3.22 
gc(A) = 4-s(A)-2 log u(B(t, €)) ( ) 


because the integrand is > ,/logc(A) and the length of the interval of integration 
is larger than 4~°“4)-?, By (3.7), the intervals ]4~°4)-? , 4-°)—! are disjoint for 
different sets A with t € A € 7, so summation of the inequalities (3.22) yields 


1 °e 1 
ot —s(A)—2 7 
16°07 dS a vy logc(A) =| | pee i@)s. 0 


In the rest of this chapter, we will implicitly use the previous method of “selecting 
recursively the branch of the tree we follow” to prove lower bounds without 
mentioning trees. 

We end this section by an exercise completing the proof of (2.143). 


so that 


Exercise 3.1.6 Consider metric spaces (Ti, d)x<n and probability measures jz on 
T;. Consider the product probability 4 on T = [];, <n J and the distance (2.142). 


(a) Prove that for t = (t,)x<n, we have 


LOLLY 1G: 


k<N 


Hint: Use (2.147). 
(b) Complete the proof of (2.143). 


3.2 Rolling Up Our Sleeves: Trees in Ellipsoids 


It is one thing to have proved abstract results but quite another thing to visualize 
the combinatorics in concrete situations. Consider an ellipsoid € as in (2.154), so 
that according to (2.155), S:=,/)o;-n a? measures its size. Assuming S < ov, the 
goal of the present section is to construct explicitly an organized tree 7 whose depth 
t(T) witnesses the size of the ellipsoid, i.c., 7(7) => S/L. This elaborate exercise 
will have us confront a number of technical difficulties. 
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We first reduce to the case where each a; is of the type 2~* for some k € Z. Let 
N* = N \ {0}. Fork € Z, let us set i = {i € N*; 2-* < a; < 2~*~'}, so that the 
sets I; cover N* and }°, 2-** card i, > yi a?/4 > S?/4, while at the same time 


eaP ler: yy 7 a1\ (3.23) 


k iclk 


Thus, we have reduced the problem to considering only ellipsoids of the type €’. 
Here comes a somewhat unexpected argument. We are going to replace €’ by a set 
of the type 


Palrel; vk, SP sar}, 


ielk 


where }°, a = 1 (a condition which ensures that P C €’). At first sight, the set P 
looks much smaller than €’, but this is not the case. First, how should we chose a, to 
ensure that P is as large as possible? Considering independent Gaussian r.v.s (9;), 


we have sup,ep )oj>1 418i = Diy /OK2- x Dien, 87» 80 that since E\/ 756), 87 is 


about ./card J; by (2.155), we obtain 


Esup >> tigi is about dvi 52 * /card Ip . (3.24) 


teP is] 


So to maximize this quantity (and, in a sense, the size of P), it is a good idea to 
choose a, = 2~** card ,/S’ where S’ = >>, 27? card I. Then, P takes the 
form 


—4k 


ye 
P= |rel?; vk, \ 72 


2 card I | 4 


iel, 


This set is very simple: geometrically, P is a product of spheres of dimension 
card J; and radius rz, where rz is defined by i = 2~*k card ie/S?. It will be very 
useful to reduce to the case where the radii of these spheres are quite different from 
each other. This is the purpose of the next lemma. 


Lemma 3.2.1 There is a subset J of Z with the following two properties: 


Sa ead ee Bh (3.25) 
keJ 
kneJ,k <n Sry <2-°r. (3.26) 
Proof Since S < ov, the sequence a, = 2~** card is bounded. We apply 


Lemma 2.9.5 (or more precisely the version of this lemma where the index set is 
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Z rather than N) with a = 2 to find a finite subset 7 C Z such that 


kneel, k~n > aq < a2! , (3.27) 
and 
y gee 8" 7h. (3.28) 
kel 


Consider k,n € I with k <n. Then ay, < 2”~*ag, and recalling the value of a, this 
means that card J, < 23°" card Ik. Recalling the value of rz, this means that y < 
i /2. Let us now enumerate J as a finite sequence (n(¢))e<y in increasing order, so 
that Tneet+l) S Tn(ey/V2 and Tn(e+12) S 2-Fra(e)- For 0 < p < 11, consider the set 
Jp = {n(p + 124); q = 0, p + 12g < N}, so that J = Up<p<i11Jp. Consequently, 
eer ar = Do<p<il 2 re: ax. Thus, using (3.28), there exists some p, 0 < 
12. . . 
Dp S 11 such that Duket a = S'*/L, and the set J, satisfies the desired 
requirements. oO 


Consider a set J as constructed in Lemma 3.2.1. We then replace P by the subset 
P’ consisting of the points t € P such that t; = 0 wheni € i, andk ¢ J. We note 
that 7,2, re Veard i = Soy. 277* card i,/S’ > S'/L, where the last inequality 
follows from (3.25). 

We have now finished our preliminary reductions. To construct inside any 
ellipsoid € an organized tree J such that its depth t(7) witnesses the size of E, 
it suffices to perform the same task for a set of the type 


Paflrelum; ween, Desf, 
iel, 


where N is a given integer, where (J;)x<y are disjoint subsets of N* of union J*, 
and where rp41 < 2~°rg. Just as in (3.24), the size of P’ is about Deen rev card Ik. 
For k < N, let us consider the sphere 


S={rePury: Diese. i¢ghon=ol. 


1él, 


It follows from the volume argument (2.45) (used for A = B and eé = 1/2) that 
there is a subset U; of Sz with 


card U, > 208d (3.29) 


such that any two distinct points of Ux, are at distance > r,/2. Given 1 <m < N and 
fork < m given yx € Ug, ye = (yk.i)icr* consider the set A = A(y1,..., ¥n) C P’ 


98 3 Trees and Other Measures of Size 


defined as 
A(yi,---+ Ym) = {te VU"); Vem, Viel, ti =yei}- 


We will show that these sets, together with the set P’, form an organized tree T 
(as defined in the previous section). When m < N, we have c(A) = card Up 41: 
the children of A are the sets A(y1,..-, Ym, y) where y € Un41. When m = N, 
we have c(A) = 0 and A(yj,..., yw) consists of single point. We now check the 
condition (3.10). Consider m < N. Define j(A) as the smallest integer j such that 
4-ja1 < rm+1/2, so that mn41 < 2- 4-J. For y € Un +1, consider the unique 
point f(y) € A(y1,..-, ¥m, y) such that t(y); = Oifi € & fork > m+ 1. Then 
for y,y’ € Umi, y # y’, the distance d(t(y), t(y’)) is the same as the distance 
between y and y’, which are different points of Un+1 C S41. Thus 


AI) rt /2 ZAG) ) = ra 4 IO 


Furthermore, recalling that rp4, < 2-5 r, (so that in particular )~ kom XS 27m) if 
x € A(1,.--, ¥m, y), then 


Ix tO) < Yo re S$ Urn 2-2 rg <TH = AIA? 
k>m+2 


so that A(y1,..., ¥m, y) C BQ), 4-J(A)-2) ag required. Let us now study t(7). 
A branch in 7 is defined by a point t € Sz, which is the unique point of a set of the 
type A(y1,..-, yn). Let us set Ag = P’ and Ay» := A(y1,..-, Ym) forl| <m < N. 
Then t € Am for0 < m < N. Also, c(Am) = card Um4 1 and 47~J4m™ > ry 44/2. 
Thus, using (3.29) in the last equality, 


Yo 44M floge(Ay= > 4-44 flog c(Am) 


teAeT O<m<N 


> > rm+iy log card Un41/L = b> rev card i, /L , 


O0<m<N 1<k<N 


and, as we have seen, this last quantity is the size of P’. 


3.3. Fernique’s Functional 


3.3.1 Fernique’s Functional 


In Sect. 3.1, we presented four equivalent methods to measure the size of a metric 
space. (Besides our usual y2(T, d), these were the maximum depth of a separated 
or organized tree contained in T and majorizing measures.) Recalling (3.17), it will 
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turn out that a fifth measure of size, the quantity 
Fer(T, d) := sup | I, (t)du(t) (3.30) 
bw IT 


will play an important role. Here, the supremum is over all probability measures on 
T which are supported by a countable set. 

Why is the functional (3.30) important? This is far from obvious. In the context 
of Gaussian processes, this functional has no special importance, and the related 
notion of majorizing measures is not particularly useful. We will first understand the 
usefulness of Fernique’s functional in Chap. 5 while studying a class of processes 
which are conditionally Gaussian. Furthermore, in the following chapters, we 
will be able to use similar ideas in far more general situations, while the proper 
generalization of the other functionals has not been found yet. In this section, we 
will prove that Fernique’s functional is equivalent to the functional y2(T, d): 


1 
7G, d) < Fer(T,d) < Ly(T,d) . (3.31) 


We will not give the simplest possible proof of this fact. Rather we will prepare for 
future work by giving arguments which contain in germ the ideas which will prove 
fruitful. The ideas of this section will not be critically used before Chap. 11. 

A further understanding of Fernique’s functional will be reached in Sect. 3.5 
where we will basically show the remarkable fact that the supremum in the right- 
hand side of (3.30) is obtained when jz is the “law of the supremum”, i.e., the law 
of ar.v. such that X; = sup,;-7 X; (see Theorem 3.5.1). 

The right-hand side inequality in (3.31) is the easiest and is a consequence of the 
following: 


Proposition 3.3.1 Consider a probability measure 4 on a metric space (T, d). 
Then 


/ Iy@)du(t) < Ly2(T, d) . (3.32) 
T 
Proof For each t € T, we define €9(t) = A(T), and forn > 1, we define 


én(t) = inf{e > 0; uw(B(t,6)) => Nay} - (3.33) 


Thus, /log(1/w(B(t, €))) < L2”/* for € > €,(t) and then 


A(T) 1 én(t) 1 
I o=| moot =| de 
. 0 log(u(B(t.€)) Jenin V log(uBG, €))) 


ZL) Pent). (3.34) 


n>0 


100 3 Trees and Other Measures of Size 


Consider an admissible sequence (A,,) of partitions with 


weT, >) 2"? A(Aj(t)) < 2y2(T, d) . (3.35) 


n>0 
Let us fix n > O and set 


T={teT; eAO)2 Ni) m=T\ = {teTs uA) < NA}. 


Thus, T; is the union of the sets A € A, which are of measure < N fat Since 


card An < Nn, we have w(T/) < Nn Noy No", Also, by definition of €, (t), we 


have €,(t) < A(An(t)) if t € T, and €,(t) < A(T) if t € T/. Consequently, 


/ 2/7, (t)du(t) = / 27/7, (t)du(t) + i 2/7 €,(t)du(t) 
T fi 


Th T; 


< i 27/2 A(An(t)) du) + LA(T)2"2N7! . 
Combining with (3.34), we obtain 


/ I,(t)dw(t) < Lar) +L f S 2" A(An(t))du(t) . (3.36) 
T T n>1 


Integrating (3.35) with respect to jz proves that the last term of (3.36) is < 


Ky2(T, d) and concludes the proof since A(T) < Ly2(T, d). oO 


3.3.2 Fernique’s Convexity Argument 


Our approach to the left-hand side of (3.31) is based on the following elementary 
fact, which is a consequence of the Hahn-Banach theorem: 


Lemma 3.3.2 Consider a number a > 0. Consider a set S of functions on a finite 
set T. Assume that for each probability measure v on T, there exists f € S such 
that { fdv <a. Then for each € > 0, there is a function f in the convex hull of S 
such that f <a+e. 


Proof Denote S* the set of functions g such that there exists f € S with f < g. 
Denote by C the closed convex hull of S*. We prove that the constant function a 
equal to a everywhere belongs to C. We proceed by contradiction. If this is not the 
case, by the Hahn-Banach theorem, we may separate C and a. That is, there exists 
a linear functional g on the space of functions on T such that g(f) > g(a) for 
f € C. Consider then a function g on T with g > 0. For each A > O and each 
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f €S,wehave f +Ag > f so that f +Ag € C and hence g(f) +Ag(g) > g(a). 
This proves that p(g) > 0, ie., g is positive. Since T is finite, g is of the form 
¢(g) = YVyer g(t) for numbers a; > 0. Setting B = )°,-7 a(t), consider the 
probability measure v on T given by v({t}) = a;/B fort € T. Then g(g) = B f gdv 
for each function g on T. Taking g = a shows that Ba = (a). By hypothesis, there 
exists f € Cwith f fdv < a.Theng(f) = B f fdv < Ba = g(a), acontradiction. 

So we have proved that a € C, the closure of the convex hull of ST. 
Consequently, there is one point of this convex hull which is < a + € everywhere. 
The result follows. Oo 


Of course, the hypothesis that T is finite is inessential; it is just to avoid secondary 
complications. 

Let us give a version of the basic lemma sufficiently general to cover all our 
needs. 


Lemma 3.3.3 Consider a finite metric space (T,d). Consider a convex function 
® :]0,1] — R*. Assume that for each probability measure 4 on T and a certain 
number D, one has 


A(T) 
i anit | P(u(B(t, €)))\de < D. (3.37) 
T 0 


Then there exists a probability measure yt on T for which 


A(T) 
sup | @(u(B(t, €)))de < 2D. (3.38) 
T JO 


te 


Proof Let us denote by M(T) the set of probability measures on T. The class C of 
functions f on T that satisfy 


A(T) 
dwe M(T); VteT, fit) = P(u(Bit, €)))de < f(t) 
0 


is convex. This is immediate to check using the convexity of ®. For each probability 


measure v on T, there exists f in C with { fdv < B: this is true for f = f, 
by (3.37). Consequently by Lemma 3.3.2, there exists f € C such that f < 2B, 
which is the content of the lemma. Oo 


Corollary 3.3.4 Consider a finite metric space (T,d). Assume that for a certain 
number C and for each probability measure tz on T, we have 


i L,()du(t) <C. (3.39) 
T 


Then there is probability measure «4 on T such that 


WeeT, Iy@) <2C+2A(T). (3.40) 
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Proof Calculus shows that the function ®(x) = ,/log(e/x) is convex for x €]0, 1] 
and ,/log(1/x) < (x) < 14+ /log(1/x). Thus 


A(T) 
In(t) < / (u(B(t, €)))de < A(T) + I(t) , 
0 


so that (3.39) implies that (3.37) holds for D = C + A(T) and (3.40) then follows 
from (3.38). oO 


Lemma 3.3.5 We have A(T) < LFer(T, d). 


Proof Consider s,u € T with d(s,u) > A(T)/2 and the probability w on T such 
that w({s}) = w({u}) = 1/2. Fore < A(T)/2 < d(s,u), we have u(B(s, €)) < 
1/2, and this implies that J,,(s) > A(T) /log2/2. Similarly we have I,,(u) > 
A(T)/L so that [7 I,(t)dw(t) > A(T)/L. Oo 


Proof of (3.31) When T is Finite Combining Corollary 3.3.4 and Lemma 3.3.5, we 
obtain that there exist a probability 2 on T such that sup,¢7 I(t) < L Fer(T, d). 
On the other hand, we have proved in Sect. 3.1 that y2(T, d) < Lsup,er In “O° a 


3.3.3. From Majorizing Measures to Sequences of Partitions 


In Sect. 3.1, we have proved that given a probability measure jz on a metric space 
T, we have 


y2(T, d) < LsupI,(t) . (3.41) 
teT 


We do not know how to generalize the arguments of the proof to the more general 
settings we will consider later. We give now a direct proof, following a scheme 
which we know how to generalize. The contents of this section will not be relevant 
until Chap. 11. First, we prove that 


A(T) < LsupI,(t) . (3.42) 
teT 


For this, we consider s,t € T with d(s,t) > A(T)/2 so that since the balls 
Bit, A(T)/4) and B(s, A(T)/4) are disjoint, one of them, say the first one, has 
a measure < 1/2. Then w(B(t,€)) < 1/2 fore < A(T)/4 and thus J, (t) = 
/log 2A(T)/4. We have proved (3.42). 


5 The argument by which we have proved this inequality will not generalize, but fortunately there 
is another route, which is described in the next section. 


3.3. Fernique’s Functional 103 


We start the main argument. We will construct an admissible sequence (A;,) of 
partitions of T which witnesses (3.41). For A € Ay, we also construct an integer 
jn(A) as follows: First, we set Ag = A; = {T} and jo(T) = ji (T) = jo, the largest 
integer with A(T) < 2-J0P) | Next forn > 1, we require the conditions 


A € Ay > A(A) < 2A | (3.43) 
teAEA, > w(BG,2-*)) > NO. (3.44) 


The construction proceeds as follows: Having constructed A,,, we split each element 
A of A, into at most N,, pieces, ensuring that card. Ay,+1 < N? = N,4+1. For this, 
we set 


Ao = {t € A; u(B(t, 2-"4")) > 1/ Ny}. (3.45) 


We claim first that we may cover Ag by < N, sets, each of diameter < Q- In AHI 
For this, we consider a subset W of Ao, maximal with respect to the property that any 
two points of W are at distance > 2~/"(4), The balls of radius 2~/"4)—! centered at 
the points of W are disjoint, and each of them is of measure > N,- ' by (3.45), so that 
there are < N, of them. Since W is maximum, the balls of radius 2~/"“) centered 
at the points of W cover Ag, and each of them has diameter < 2-mA+1 Thus, 
there exists a partition of Ag in < N, sets of diameter < 2-mA+1 The required 
partition of A consists of these sets B and of A, = A \ Ao. For each set B, we 
set jnt1(B) = jn(A) + 1, and we set jn41(A1) = jn(A), So that conditions (3.43) 
and (3.44) hold. 
This completes the construction. The important point is that 


Be Anyi, B CAE An, jnti(B) = jn(A) > WBC, 2-1) 
= w(B,2")) <N7!. (3.46) 
This property holds because if t € A and w(B(t, 2Q-nAD)) > NO. then t € Ag 
and the element B of A,+1 which contains t has been assigned a value j,+1(B) = 


Jn(A) + 1. 
To prove (3.41), we will prove that 


weT, ) 2"? A(An(t)) < LI, (t) . 


n>0 


We fix t € T. We set j(n) = jn(An(t)) and a(n) = 2”/72-J™, Using (3.43), it 
suffices to prove that 


a a(n) < LI,(t) . (3.47) 


n>0 
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We consider a = 2 and the corresponding set J as in (2.84) (leaving to the reader 
to prove that the sequence (a(7)) is bounded, which will become clear soon). Thus, 
as in (2.91), we have 


né€In>1s>jmt+)D=J/M+1, ja-D=j@). (3.48) 


We enumerate J \ {0} as a sequence (nx)x>o (leaving again to the reader the easier 
case where / is finite), so that j(mg41) > j(mx + 1) = j(mg) + 1 and 


Y a(n) < La) +L a(ng). (3.49) 


n=0 k>1 
From (3.48), we have j (ng — 1) = j (nx). Using (3.46) for n = ng — 1, we obtain 


BE, TIO) Ns 


so that ,/log(1/(B(t, €)) > 2"/?/L fore < 2-/), Since j (ne41) > j (ng), this 


implies 


a(n) = gn/2—-j@) <2. gnk/2(Q—-J x) = Qs K+1)) 


[. J (ng) 
——— dur(e 
Vag ye ar €)) He) 


Summation of these inequalities and use of (3.42) and (3.49) proves (3.47). oO 


3.4 Witnessing Measures 


Proposition 3.4.1 (/65]) For a metric space (T, d), define 


62(T, d) = sup inf J,,(t) , (3.50) 
pw teT 


where the supremum is taken over all probability measures jz on T.® Then 


Fy? d) < :(T,d) < Ly(T,d). (3.51) 


6 Please observe that the order of the infimum and the supremum is not as in (3.20). 
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It is obvious that infer I,(t) < ie I,(t)du(t) so that 62(T,d) < Fer(T, d). 
Thus, the right-hand side of (3.51) follows from the right-hand side of (3.31), while 
the left-hand side of (3.31) follows from the left-hand side of (3.51). 

The most important consequence of (3.31) is that there exists a probability 
measure yz on T for which infye7 I(t) = y2(T, d)/L. Such a probability measure 
“witnesses that the value of 72(T,d) is large” because of the right-hand side 
of (3.51). In this spirit, we will call pz a witnessing measure, and we define its “size” 
as the quantity infer I, (t).’ Witnessing measures can be magically convenient. 
One of the first advances the author made beyond the results of [132] was the 
realization that witnessing measures yield a proof of Theorem 5.2.1 below an order 
of magnitude easier than the original proof. This approach is now replaced by the use 
of Fernique’s functional because, unfortunately, we do not know how to extend the 
idea of witnessing measure to settings where multiple distances will be considered. 
Finding proper generalizations of Proposition 3.4.1 to more general settings is an 
attractive research problem (see in particular Problem 10.15.4). 


Proof of Proposition 3.4.1 The right-hand side inequality follows from Proposi- 
tion 3.3.1 and the trivial fact that infrer I(t) < f I, (t)du(t). The reader should 
review the material of Sect. 3.1 to follow the proof of the converse. Recalling (3.5), 
given an organized tree 7, we define a measure yp on T by w(A) = Oif ANS; =G 
and by 


1 
teAEe 


The intuition is that the mass carried by A € T is equally divided between the 
children of A. Then, /,,(t) = 00 if t ¢ Sy. Consider t € A € 7 and j = j(A). 
Then, since 7 is an organized tree, B(t, 4-J a) meets only one child of A, so that 
UBC, Aas) 2 1/c(A). Copying the argument of (3.22) readily implies that 
LIy(t) = DreaetT 4 1 ™ Vlog c(A) from which the result follows by (3.15). 0 


Exercise 3.4.2 For a metric space (7, d), define 


x2(T, d) = sup inf il So 2" A(An(t))du(t) , 


n>0 


where the infimum is taken over all admissible sequences and the supremum over all 
probability measures. It is obvious that x2(T, d) < y2(T, d). Prove that y2(T, d) < 
Lx2(T, d). Hint: Prove that the functional 72(7, d) satisfies the appropriate growth 
condition. Warning: The argument takes about half a page and is fairly non-trivial. 


7 Thus, a probability measure jz on T is both a majorizing and a witnessing measure. It bounds 
y2(T, d) from above by L sup,<7 1, (t) and from below by inf;<7 I, (t)/L. Furthermore, one may 
find jz such that these two bounds are of the same order. 
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3.5 An Inequality of Fernique 


We end up this chapter with a beautiful inequality of Fernique. It will not be 
used anywhere else in this work, but is presented to emphasize Fernique’s lasting 
contributions to the theory of Gaussian processes. 


Theorem 3.5.1 Consider a Gaussian process (X1)ter. Provide T with the canoni- 
cal distance associated with this process, and consider a probability measure [4 on 
T. Consider a rv. t of law w. Then for any probability measure v on T, we have 


EX, < uf I,(@t)du(t) + LA(T, d). (3.52) 


Here of course, X; is the r.v. Xz(~)(@). We leave some technical details aside, and 
prove the result only when T is finite. The basic principle is as follows: 


Lemma 3.5.2. Consider a standard Gaussian rv. g and a set A. Then, E1,4|g| < 
LP(A),/log(2/P(A)). 


Proof We write 
[o,@) [o.@) 
Elalgl = / P(AN {Ig| = t))dt < i min(P(A), 2 exp(—17/2))dr . 
0 0 


Letting a = ,/2log(2/P(A)), we split the integral in the regions tf < a andt > a. 
We bound the first part by wP(A) and the second by 


i 2 exp(—17/2)dt < Z i: 2t exp(—t7/2)dt = P(A)/a < LP(A)a. oO 
a Qa a 


Corollary 3.5.3 Consider Gaussian r.v.s (gi)i<n with Eg? < a. Consider a rv. t 
valued in {1,..., N}, and let a; = P(t =i). Then Elg;| < La EN aj./log 2/aj. 


Proof Since |g;| | = en 1,,=i|gi|, and using Lemma 3.5.2 to obtain 
Eltc=iy|gi| < Law;,/log 2/o;. 7 
We also need the following elementary convexity inequality: 


Lemma 3.5.4 Consider numbers a; > 0 with Vien a; <a <1.Then 


> a; /log(2/a;) < ay/log(2N/a) . (3.53) 


i<N 


Proof Calculus shows that the function g(x) = x,/log(2/x) is concave increasing 
for x < 1, so that if a’ = Vi<n %> then N—! Di<n Gi) < g’/N) < 


y(a/N). o 
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The slightly technical part of the proof of Theorem 3.5.1 is contained in the 
following: 


Lemma 3.5.5 Consider probability measures 4 and v on a metric space (T, d). 
Then there exist no € Z and a sequence of partitions (Bn)n>no (which need not 
be increasing) of T with the following properties: First, By contains only one set. 
Next, the sets of By, are of diameter < gant. Finally, 


Yi2"™ So (AN B)y1log(2/u(A 0 B)) 


n=Nno AEB, BEBn+1 


< Lf K@du(t) + LACT, d) . (3.54) 


Proof We consider the largest integer no with 27" > A(T, d). We set Bn, = {T}. 
For n > no, we proceed as follows: for k > 0, we set 


Thk ={t €T 5 1/Nep1 < v(B,2™")) < 1/Ne} - 


The sets (T,,«)xk>0 form a partition of 7. Consider a subset V of 7;,,, such that the 
points of V are at mutual distance > 2~”*!. The balls of radius 2~” centered at the 
points of V are disjoint, and by definition of 7,,,;, they have a v-measure > 1/Nx+1. 
Thus, card V < Ng+1, and according to Lemma 2.9.3, ex4.1(Th,k) < 2-"+1) and thus 
Tn,« can be partitioned into at most Nz+1 sets of diameter < 2-"+2 We construct 
such a partition B,¢ of T,,% for each k, and we consider the corresponding partition 
By, of T. 

We now turn to the proof of (3.54). First we note that card By, < N41 and 
card Byte < Ne+1. Also, ACB, ¢,BEB MAN B) < MT kA Th+1,¢). We 
then use (3.53) to obtain 


ee = (AN B)y/log(2/n(AN B)) 


AEBnk BeEBn+1,¢ 


n+1,€ 


< WT ke O Tri,e)vVlog(2Nii1 Nev /mM nk O Th+1,0)) 


The left-hand side of (3.54) is an 20% ve Sn.k,¢- We will use the decompo- 
sition : 


Yi Snee= >> Saeet Do Sree 
ke 


(k, jel (n) (k,l)eJ(n) 
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where I(n) = {(k, ©); Mn O Trtie) 2 1/(Ne+1Ne41)} and Jin) = 


{(k, €); WT k O Th41,0) < 1/(Ne+1Ne+1)}. Then 


Yo Ske SD) Wak Trey log(2NQ, Ney) 


(k,£)el (n) k,£ 


< LY) Tn 0 Tngie)(24? + 2%) 
k,e 


<L 0 W Ing 2? + LO engi 0)2. (3.55) 
k £ 


Now the definition of 7, shows that 


> no"? = / Vlog(1/v(B(t, 2-))du(t) , (3.56) 


k>1 


and thus 


Duna s Lf Wodwe. 


n=no k>1 


Next, since the function g(x) = x./log 2Nx41 Ne+1/x increases for x < Nx+1Ne+1, 
for (k, £) € J(n), we have o((Tn,k 1 Tn+1,0)) < GC/(Ne+1Ne41)) so that 


Yo Snes Yo GUT ROTO) SD) GA/NepiNe+1) 
(k,QeI(n) (k,QeI(n) (k,QeI(n) 
gk/2 aa 3¢/2 


<L ———— <L. (3.57) 
ie N+ Nes 


Combining these estimates yields the desired inequality: 


yo See Lf enyaucn + 22-™ o 


n=Nno k,l>0 


Proof of Theorem 3.5.1 Forn > no and A € By, we fix an arbitrary point t,,4 € A. 
We lighten notation by writing fo = t),r. We define 7, (t) = t),4 fort € A € Bp. 
We write 


Xr — Xo = > X ay41(t) _ Xx, (t) , (3.58) 


n>ng 
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so that defining Y,.7 := Xz,,1(r) — Xa,(r)» We have 


E[X, — Xiol = >) El¥ncl- (3.59) 


n>no 


Given A € B, and B € By +1, let us define the variable Y4 3 := X;,,, , —Xt,4- The 
sets AM B for A € B, and B € 6,4; forma partition of T. When t(w) € AN B, 
we have 7,(T) = tha and my41(T) = tn41,B SO that Y,, = Y,4.z. The event 
t(w) € AN B has probability ~(A NM B) since t has law nw. When AN B ¥ @, we 
have d(tn41,B,t,aA) < A(A)+ A(B) < L2™, so that EVs Sati avin” = 
L2-", It then follows from Corollary 3.5.3 that 


El¥ncl< 22" SY) (AN B)y/log(2/u(ANB)) , 


AcB,, BEBy +1 


and summation over 7 and use of (3.54) finishes the proof. oO 


It is actually possible to give a complete geometric description of the quantity 
sup, EX, where the supremum is taken over all the t of given law yu (see [98]). 


Key Ideas to Remember 


¢ Trees in a metric space (T,d) are well-separated structures which are easy to 
visualize and provide a convenient way to measure the size of this metric space, 
by the size of the largest tree it contains. For suitable classes of trees, this measure 
of size is equivalent to y2(T, d). 

¢ One may also measure the size of a metric space by the existence of certain 
probability measures on this space. Fernique’s majorizing measures were used 
early to control from above the size of a metric space in a way very similar to 
the functional y2(T, d), which is, however, far more technically convenient than 
majorizing measures. 

e An offshoot of the idea of majorizing measures, Fernique’s functional, is an 
equivalent way to measure the size of a metric space and will be of fundamental 
importance in the sequel. 

e The size of a metric space (T,d) can also be bounded from below by the 
existence of well-scattered probability measures on 7. 


Chapter 4 ®) 
Matching Theorems crests 


We remind the reader that before attacking any chapter, she should find useful to 
read the overview of this chapter, which is provided in the appropriate subsection of 
Chap. 1. Here, this overview should help to understand the overall approach. 


4.1 The Ellipsoid Theorem 


As pointed out after Proposition 2.13.2, an ellipsoid € is in some sense quite smaller 
than what one would predict by looking only at the numbers e, (E). We will trace the 
roots of this phenomenon to a simple geometric property, namely, that an ellipsoid 
is “sufficiently convex”, and we will formulate a general version of this principle for 
sufficiently convex bodies. The case of ellipsoids already suffices to provide tight 
upper bounds on certain matchings, which is the main goal of the present chapter. 
The general case is at the root of certain very deep facts of Banach space theory, such 
as Bourgain’s celebrated solution of the A» problem in Sects. 19.3.1 and 19.3.2. 
Recall the ellipsoid € of (2.154), which is defined as the set 


2 
t 
e={re@i Dei (2.154) 
i>1 “7 
and is the unit ball of the norm 


x2\ 1/2 
lIxlle =(i4) : (4.1) 


i>1 “i 
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Lemma 4.1.1 We have 


+y lx — yllZ 
Ixlle. Iyle s1> |], <1-— =. (4.2) 
Proof The parallelogram identity implies 
llx — ylle + Ix + ylle = ile llg + 2llyllg < 4 
so that 
lx + ylle <4- lx — yllz 
and 
x+y 1 9\1/2 1 5 
||, < @- gle- iB) © <1 gle. a 


Since (4.2) is the only property of ellipsoids we will use, it clarifies matters to 
state the following definition: 


Definition 4.1.2 Consider a number p > 2. A norm ||- || in a Banach space is called 
p-convex if for a certain number 7 > 0 we have 


xl, yl <1 |S] st —als— yi. (43) 


Saying just that the unit ball of the Banach space is convex implies that for 
|x|, lly || < 1, we have ||(x + y)/2]| < 1. Here, (4.3) quantitatively improves on this 
inequality. Geometrically, it means that the unit ball of the Banach space is “round 
enough”. 

Thus, (4.2) implies that the Banach space ¢? provided with the norm || - |l¢ is 
2-convex. For 1 < gq < ov, the classical Banach space L% is p-convex where 
p = max(2, q). The reader is referred to [57] for this result and any other classical 
facts about Banach spaces. Let us observe that taking y = —~x in (4.3), we must 
have 


2?n<1. (4.4) 


In this section, we shall study the metric space (T, d) where T is the unit ball of 
a p-convex Banach space B and where d is the distance induced on B by another 
norm || - ||~. This concerns in particular the case where T is the ellipsoid (2.154) 
and || - ||~ is the €? norm. 
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Given a metric space (T, d), we consider the functionals 


1/6 
Yo,p(T,d) = inf (sup > (YA, (0))*) (4.5) 


teT 130 


where a and f are positive numbers and where the infimum is over all admissible 
sequences (A,,). Thus, with the notation of Definition 2.7.3, we have ya.1(T, d) = 
Ya(T,d). For matchings, the important functionals are y2,2(T,d) and y1,2(T, d) 
(but it requires no extra effort to consider the general case). The importance of these 
functionals is that under certain conditions, they nicely relate to y2(T, d) through 
Holder’s inequality. We explain right now how this is done, even though this spoils 
the surprise of how the terms ./log N occur in Sect. 4.5. 


Lemma 4.1.3. Consider a finite metric space T, and assume that cardT < Ny. 
Then, 


y2(T,d) < Jmyr,2(T, d) . (4.6) 


Proof Since T is finite, there exists! an admissible sequence (A,,) of T for which 


we T, )\2"A(An(t))? < ¥2,2(T, a) . (4.7) 


n>0 


Since card T < Nm, we may assume that A,, consists of all the sets {t} fort € T. 
Then, A,,(t) = {t} for each ¢, so that in (4.7) the sum is really overn < m—1. Since 
for any numbers (an )o<n<m—1 We have i y<n<m—14n < a OY ecient 4 a?)'/? by 
the Cauchy-Schwarz inequality, it follows that 


WET, ))2"?A(An(t) < Vmy2,2(T, d) . o 


n>0 


How to relate the functionals yj. and y2 by a similar argument is shown in 
Lemma 4.7.9. 

We may wonder how it is possible, using something as simple as the Cauchy- 
Schwarz inequality in Lemma 4.1.3, that we can ever get essentially exact results. 
At a general level, the answer is obvious: it is because we use this inequality in the 
case of near equality. That this is indeed the case for the ellipsoids of Corollary 4.1.7 
is a non-trivial fact about the geometry of these ellipsoids. 


' Since there are only finitely many admissible sequences, the infimum over these is achieved. 
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Theorem 4.1.4 [fT is the unit ball of a p-convex Banach space, if n is as in (4.3) 
and if the distance d on T is induced by another norm, then 


Ye, p(T, d) < K(a, p,n) sup 2"/“en(T, d) . (4.8) 
n>0 


Before we prove this result (in Sect. 4.2), we explore some of its consequences. 
The following exercise stresses the main point of this theorem: 


Exercise 4.1.5 Consider a general metric space (T, d). 


(a) Prove that 


l/p 
Ya,p(T, d) = K(a)( > (2"en(T,d))?) (4.9) 
n>0 
and that 
sup 2"/“en(T,d) < K(a)Ya,p(T,d) . (4.10) 
n>0 


(b) Prove that it is essentially impossible in general to improve on (4.9). Hint: You 
probably want to review Chap. 3 before you try this. 


Thus, knowing only the numbers e,(T,d), we would expect only the general 
bound (4.9). The content of Theorem 4.1.4 is that the size of T, as measured by the 
functional Yq, p, 1s actually much smaller than that. 


Corollary 4.1.6 (The Ellipsoid Theorem) Consider the ellipsoid E of (2.154) and 
a > 1. Then? 


Yu,2(E) < K (a) sup e(card{i ; aj > €})'/”. (4.11) 
e>0 
Proof Without loss of generality, we may assume that the sequence (aj) is non- 


increasing. We apply Theorem 4.1.4 to the case || - || = || - lle, where d is the 
distance of £2, and we get 


Yo.2(E) < K(a) sup 2”/%en(E) . 


n>0 
To bound the right-hand side, we write 


sup 2”/"en(E) < 27/“eg(E) + sup 2@19)/%e, 1. 3(E) . 


n>0 n>0 


? Recalling that a subset of ¢? is always provided with the distance induced by the 7 norm. 
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We now proceed as in the proof of Proposition 2.13.2. Using (2.166), we have 


sup 2""+3)/%@,4.3(E) < K(a) sup 2”/% max aay 
n>0 n>0 ksn 


= K(a) sup 2 a 


O0<k<n 
= K(a) sup 2*/a,. , (4.12) 
k>0 


and since eg(E) < a), we have proved that yy.2(E) < K (a) SUP,>0 2"/¢ aon, Finally, 
the choice € = az» shows that 


Ol an << sup €(card{i ; a; > ey 
e>0 
since card{i; aj > azn} > 2” because the sequence (q;) is non-increasing. oO 


The restriction aw > | is inessential and can be removed by a suitable modification 
of (2.166). The important cases area = | and a = 2. We will use the following 
convenient reformulation: 


Corollary 4.1.7 Consider a countable set J, numbers (b;)jcy, and the ellipsoid 


ees [x erg, > hee i} 
jel 


Then 


1 
Ya,2(E) < K(a) sup —(card{j € J; |bj| <u}y)'/. 
Uu 


u>0 


Proof Without loss of generality, we can assume that J = N. We then set a; = 1/bj, 
we apply Corollary 4.1.6, and we set € = I/u. Oo 


We give right away a striking application of this result. This application is at the 
root of the results of Sect. 4.7. We denote by 4 Lebesgue’s measure. 


Proposition 4.1.8 Consider the set L of functions f : [0,1] > Rsuch that f (0) = 

fC) = 0, f is continuous on [0, 1], f is differentiable outside a finite set, and 

sup | f’| < 1° Then y1,2(L, d2) < L, where do(f, g) = IIf — gle = (fon - 
1/2 

g)*da)'?. 


3 The same result holds for the set £’ of 1-Lipschitz functions f with f(0) = f(1) = 0, since ZL is 
dense in L’. 
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Proof The very beautiful idea (due to Coffman and Shor [26]) is to use the Fourier 
transform to represent £ as a subset of an ellipsoid. The Fourier coefficients of a 
function f € £ are defined for p € Z by 


1 
cp(f) = / exp(27ipx) f (x)dx . 
0 
The key fact is the Plancherel formula, 


Ifle=(dolepP)”, (4.13) 


peZ 


which states that the Fourier transform is an isometry from L?({0, 1]) into l¢.(Z). 
Thus, if 


D = {(cp(f)) pez; f € Lh, 


the metric space (£,d2) is isometric to a subspace of (D,d), where d is the 
distance induced by ve (Z). It is then obvious from the definition that 1,2(L, d2) < 
¥1,2(D, d), so that it suffices to prove that y},2(D, d) < oo. By integration by parts 
and since f(0) = f(1) = 0, cp(f’) = —2ipcp(f), so that, using (4.13) for f’, 
we get 


Y rietirs> leGr=Iir bc: 


peZ peZ 


For f € £L, we have f(0) = 0 and | f’| < 1 so that | f| < 1 and |co(f)| < 1. Thus 
for f € £L, we have 


leo AI? + >) PPlep(AP <2, 


peZ 


and thus D is a subset of the complex ellipsoid € in ire (Z) defined by 


E = [(cp) € RZ); > max, p*)lepl? <2}. 
peZ 


Viewing each complex number c, as a pair (Xp, yp) of real numbers with |cp ? = 
i + y yields that € is (isometric to) the real ellipsoid defined by 


Y= max(1, p°)(x, +93) <2. 
peZ 
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We then apply Corollary 4.1.7 as follows: The set J consists of two copies of Z. 
There is a two-to-one map ¢ from J to Z and bj = max(1, |g(j)|). Then card{j € 
J; \bj| <u} < Lu foru => 1 and=0 foru < 1. oO 


Exercise 4.1.9 

(a) For k > 1, consider the space T = {0, 1% Writing t = (¢;);<9* a point 
of T, consider on T the distance d(t, t’) = 2-J—-!, where j = minfi < 
2k. # t }. Consider the set £ of 1-Lipschitz functions on (7, d) which are 


zero att = (0,...,0). Prove that yj 2(L, doo) < LAk, where do. denotes the 
distance induced by the uniform norm. Hint: Use Lemma 4.5.18 to prove that 
en(L, doy) < L2~”, and conclude using (4.9). 

(b) Let yz denote the uniform probability on T and dz the distance induced by 
LE? Ga): It can be shown that y1,2(£, d2) > Vk/L. (This could be challenging 
even if you master Chap.3.) Meditate upon the difference with Proposi- 
tion 4.1.8. 


4.2 Partitioning Scheme II 


Consider parameters a, p > 1. 
Theorem 4.2.1 Consider a metric space (T,d) and a number r > 4. Assume that 
for j € Z, we are given functions s; = 0 on T with the following property: 
Whenever we consider a subset A of T and j € Zwith A(A) < 2r~/, 
then for each n > | either e, (A) < r—-/—!, orelse there exists t € A 


with s(t) = (2/%r—J7!)p (4.14) 


Then we can find an admissible sequence (A,) of partitions such that 


WET; DQ" A(An(t)))? < K(a, p, r(Acr, dy? + sup )s/()) 
n=O fe! jeZ 


(4.15) 


The proof is identical to that of Theorem 2.9.8 which corresponds to the case a = 2 
and p = 1. 


Proof of Theorem 4.1.4 We recall that by hypothesis 7 is the unit ball for the norm 
| - || of p-convex Banach space (but we study T for the metric d induced by a 
different norm). For t € T and j € Z, we set 


cj(t) = inf{|vl| ; v € Balt, VAT} <1, (4.16) 
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where the index d emphasizes that the ball is for the distance d rather than for the 
norm. Since T is the unit ball, we have c;(t) < 1. Let us set 


D = sup2"/“e,(T, d) . (4.17) 


n>0 
The proof relies on Theorem 4.2.1 for the functions 
sj(t) = KD? (cj42(t) — cj-1(t)) , (4.18) 


for a suitable value of K. Since c;(t) < 1, it is clear that 


weT, > s(t) <3KD", 
jeéZ 


and (using also that A(T, d) < 2e9(T, d)) the issue is to prove that (4.14) holds 
for a suitable constant K in (4.18). Consider then a set A C T with A(A) < 2r7/, 
consider n > 1, and assume that e,(A) > a := r-/—!. The goal is to findt € A 
such that s;(¢) = (2"/*r-J-!)?, ie., 


KD? (ej42(8) — ¢y-1@) = QA FP (4.19) 


For this, let m = N,. According to Lemma 2.9.3, (a) there exist points (t¢)¢<m in 
A, such that d(te, tz) => a whenever ¢ 4 £’. We will show that one of the points te 
satisfies (4.19). Consider He = TM Ba(te, a/r) = TO Ba(te, yo), By definition 
of cj41(te), we have cj+42(te) = inf{||v|| ; v ¢ He}. The basic idea is that the points 
of the different sets Hz cannot be too close to each other for the norm of T because 
there are NV, such sets. So, since the norm is sufficiently convex, we will find a point 
in the convex hull of these sets with a norm quite smaller than max¢<m Cc j+2(t¢). To 
implement the idea, consider u’ such that 


2>u' > maxinf{|lvl| ; v € Ae} = max cj+2(t¢) . (4.20) 
L<m L<m 


For £ < m, consider vg € Hy with ||v¢|| < uv’. It follows from (4.3) that for £, ’ < m, 


7 — 1 ||P 
| Bee | 2 1=y|—— (4.21) 
2u’ u’ 
Set 
w= inf { [vl veconv (J Hel. (4.22) 


l<m 
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Since (vg + vg’) /2 € conv ecm He, by definition of u , we have u < ||ve + v)||/2, 
and (4.21) implies 


Uu P 
<1-7n ’ 


/ 


Ve — Ve 
/ 


so that, using that u’ < 2 in the second inequality below, 


)pul —uy!/p u’ —u\!/p 
ve — vel sw'(=—")" < R= 2)", 
nu 1 


and hence the points we := Ro! (ve — v1) belong to the unit ball T. Now, since He C 
Ba(te, a/r), we have ve € Byg(te,a/r). Since r > 4, we have d(vg, ve) > a/2 for 
€ & ¢’, and since the distance d arises from a norm, by homogeneity, we have 
d(we, we) = Ro a2 for £ 4 £’. Then Lemma 2.9.3, (c) implies that e,_1(T, d) = 
R7'a/4, so that from (4.17) it holds that 2“~)/“R-!a/4 < D, and recalling that 
R = 2((u' — u)/n)!/?, we obtain 


(2veri-!)P < KD? (ul —u), 


where K depends on @ only. Since this holds for any w’ as in (4.20), there exists £ 
such that 


(2"/*rJ-!)P < K DP (cj40(te) —u). (4.23) 
Now, by construction, for 2’ < m, we have 
Hy C Ba(te,a/r) = Ba(te,r!7) C Bate, r/*") 


since d(te, te) < 2r~/ as te, te € A and A(A) < 2r-/. Thus conv pcm Ay Cc 
Ba(te, r~/+!) OT, and from (4.16) and (4.22), we have u > cj—1(te), and we have 
proved (4.19). oO 


Exercise 4.2.2 Write the previous proof using a certain functional with an appro- 
priate growth condition. 


The following generalization of Theorem 4.1.4 yields very precise results when 
applied to ellipsoids. It will not be used in the sequel, so we refer to [132] for a 
proof. 


Theorem 4.2.3 Consider B , B', p > O with 


ae | 
—=—4+-., (4.24) 
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Then, under the conditions of Theorem 4.1.4, we have 
jilee B 1/p' 
Ya,p(T,d) = K(p.9,a)( YQ" en(T, ah). 
n 


Exercise 4.2.4 Use Theorem 4.2.3 to obtain a geometrical proof of (2.159). Hint: 
Choose a = 2, 8B = 1, B’ = p = 2 and use (2.166). 


4.3 Matchings 


The rest of this chapter is devoted to the following problem. Consider N r.v.s 
X1,..., Xy independently and uniformly distributed in the unit cube [0, 1]“, where 
d > 1. Consider a typical realization of these points. How evenly distributed 
in [0, 1]? are the points X;,..., Xv? To measure this, we will match the points 
(Xi)i<n with nonrandom “evenly distributed” points (¥;);<x, that is, we will 
find a permutation z of {1,..., N} such that the points X; and Y,(;) are “close”. 
There are different ways to measure “closeness”. For example, one may wish that 
the sum of the distances d(X;, Yz(i)) be as small as possible (Sect.4.5), that the 
maximum distance d(X;, Yz(;)) be as small as possible (Sect. 4.7), or one can use 
more complicated measures of “closeness” (Sect. 17.1). 

The case d = | is by far the simplest. Assuming that the X; are labeled in a way 
that X; < X2 <...and similarly for the Y;, one has E sup; ,y |Xi — Yi| < LVN. 
This is a consequence of the classical inequality (which we will later prove as an 
exercise): 


E sup |card{i<N; X; <t}—Nt|<LVN. (4.25) 


O0<t<1 


The case where d = 2 is very special and is the object of the present chapter. 
The case d > 3 will be studied in Chap. 18. The reader having never thought of 
the matter might think that the points X1,..., Xy are very evenly distributed. This 
is not quite the case; for example, with probability close to one, one is bound to 
find a little square of area about N~! log N that contains no point X;. This is a very 
local irregularity. In a somewhat informal manner, one can say that this irregularity 
occurs at scale /log N/W/N. This specific irregularity is mentioned just as an easy 
illustration and plays no part in the considerations of the present chapter. What 
matters here* is that in some sense, there are irregularities at all scales 2-* for 
1<k< Lo} log N and that these are all of the same order. To see this, let us 
think that we actually move the points X; to the points Y,,;) in straight lines. In 
a given small square of side 2~*, there is often an excess of points X; of order 


4 This is much harder to visualize and is specific to the case d = 2. 
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VN2-2k = 2-*,/N. When matched these points will leave the square and will 
cross its boundary. The number of points crossing this boundary per unit of length 
is independent of the scale 2~*. It will also often happen that there is a deficit of 
points X; in this square of side 2~*, and in this case, some points X; will have to 
cross the boundary to enter it. The flows at really different scales should be roughly 
independent, and there are about log N such scales, so when we combine what 
happens at different scales we should get an extra factor ./log N (and not log N). 
Crossing our fingers, we should believe that about ./N log N points X; per unit of 
length cross a typical interval contained in the square, so that the total length of the 
segments joining the points X; to the points Y,;;) should be of that order.> This fact 
that all scales have the same weight is typical of dimension 2. In dimension 1, it is 
the large scales that matter most, while in dimension > 3, it is the small ones. 


Exercise 4.3.1 Perform this calculation. 


One can summarize the situation by saying that 


obstacles to matchings at different scales may combine 


in dimension 2 but not in dimension > 3 . (4.26) 


It is difficult to state a real theorem to this effect, but this is actually seen with great 
clarity in the proofs. The crucial estimates involve controlling sums, each term of 
which represents a different scale. In dimension 2, many terms contribute to the final 
sum (which therefore results in the contribution of many different scales), while 
in higher dimension, only a few terms contribute. (The case of higher dimension 
remains non-trivial because which terms contribute depend on the value of the 
parameter.) Of course, these statements are very mysterious at this stage, but we 
expect that a serious study of the methods involved will gradually bring the reader 
to share this view. 

What does it mean to say that the nonrandom points (Y;)i<y are evenly 
distributed? When N is a square, N = n, everybody will agree that the N points 
(k/n, €/n),1 <k, € <n are evenly distributed, and unless you love details, you 
are welcomed to stick to this case. More generally, we will say that the nonrandom 
points (¥;)i< are evenly spread if one can cover [0, 1]* with N rectangles with 
disjoint interiors, such that each rectangle R has an area 1/N, contains exactly one 
point Y;, and is such that® R C B(¥;, 10//N). To construct such points, one may 
proceed as follows: Consider the largest integer k with k* < N, and observe that 
k(k +3) > (k + 1)? = N, so that there exist integers (nj)i<x withk <nj <k+3 
and }°;-,ni = N. Cut the unit square into k vertical strips, in a way that the i-th 


5 As we will see later, we have guessed the correct result. 

© There is nothing magic about the number 10. Thinks of it as a universal constant. The last thing I 
want is to figure out the best possible value. That 10 works should be obvious from the following 
construction. 
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strip has width n;/N and to this i-th strip attribute n; points placed at even intervals 
1/n;.! 

The basic tool to construct matchings is the following classical fact. The proof, 
based on the Hahn-Banach theorem, is given in Sect. B.1. 


Proposition 4.3.2 Consider a matrix C = (cij)i,;<n- Let 


M(C) Sint > Gre, 


i<N 
where the infimum is over all permutations m of {1,..., N}. Then 
M(C) = sup Y\(w; + w}), (4.27) 
i<N 


where the supremum is over all families (wj)i<n , (wi )i<n that satisfy 
Vi,j <N, wit wi <Cij . (4.28) 


Thus, if cj; is the cost of matching i with j, M(C) is the minimal cost of a matching 
and is given by the “duality formula” (4.27). 
A well-known application of Proposition 4.3.2 is another “duality formula”. 


Proposition 4.3.3. Consider points (Xj)i<n and (Y;)j<n in a metric space (T, da). 
Then 


inf) d(Xi, Yew) = sup ) (F(X) — fH), (4.29) 


i<N eC icy 


where C denotes the class of 1-Lipschitz functions on (T, d), i.e., functions f for 
which | f(x) — f(y)| < d@, y). 


Proof Given any permutation z and any 1-Lipschitz function f, we have 


Yo F(X) — £H) = OK) - Faw) = Do (Ki, Yaw) - 


i<N i<N isN 


7 A more elegant approach dispenses from this slightly awkward construction. It is the concept 
of “transportation cost”. One attributes mass 1/N to each point X;, and one measures the “cost 
of transporting” the resulting probability measure to the uniform probability on [0, 1]*. In the 
presentation, one thus replaces the evenly spread points Y; by a more canonical object, the uniform 
probability on [0, 1]?. This approach does not make the proofs any easier, so we shall not use it 
despite its aesthetic appeal. 
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This proves the inequality > in (4.29). To prove the converse, we use (4.27) with 
Cij = d(Xj, Yj), so that 


inf )) d(Xi, Ye) = sup ) ) (wi + w)) , (4.30) 
i<N i<N 


where the supremum is over all families (w;) and (w;) for which 
Vi, Jj <N, wi +w', <d(Xi, Yj). (4.31) 
Given a family (w; )i<n, consider the function 


fa)= as + d(x, ¥j)). (4.32) 


It is 1-Lipschitz, since it is the minimum of functions which are themselves 1- 
Lipschitz. By definition, we have f(Y;) < —v', and by (4.31) fori < N, we 
have w; < f (Xj), so that 


> + w) = OD - FM). o 


i<N i<N 


Exercise 4.3.4 Consider a function f which achieves the supremum in the right- 
hand side of (4.29). Prove that for an optimal matching, we have f (X;)— f (Yx(i)) = 
d(X;, Yx(i)). If you know f, this basically tells you how to find the matching. To 
find Y,,(;), move from X; in the direction of steepest descent of f until you find a 
points Y;. 


The following is a well-known and rather useful result of combinatorics. We 
deduce it from Proposition 4.3.2 in Sect.B.1, but other proofs exist, based on 
different ideas (see, for example, [21] § 2). 


Corollary 4.3.5 (Hall’s Marriage Lemma) Assume that to eachi < N, we 
associate a subset A(i) of {1,..., N} and that, for each subset I of {1,..., N}, 
we have 


card ( U AW) > card]. (4.33) 
ie] 
Then we can find a permutation 1 of {1,..., N} for which 


Vi < N, r(i) € Ali). 
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4.4 Discrepancy Bounds 


Generally speaking, the study of expressions of the type 


Wg, - / fdp)| (4.34) 


sup | 
‘EF icy 


for a class of functions F will be important in the present book, particularly in 
Chap. 14. A bound on such a quantity is called a discrepancy bound because since 


[urea f rau) = Noy fee f Fan 


i<N i<N 


it bounds uniformly on F the “discrepancy” between the true measure f fd and 
the “empirical measure” N~! >) <y f (Xi). Finding such a bound simply requires 
finding a bound for the supremum of the process ( |Z ¢|) fe#, where the (centered) 
rv.s Z are given by® 


Zp = (fF (Ki) - / fd), (4.35) 


i<N 


a topic at the very center of our attention. 

A relation between discrepancy bounds and matching theorems can be guessed 
from Proposition 4.3.3 and will be made explicit in the next section. In this book, 
every matching theorem will be proved through a discrepancy bound. 


4.5 The Ajtai-Komlés-Tusnady Matching Theorem 


Theorem 4.5.1 ((3]) Jf the points (Yj)i<n are evenly spread and the points 
(Xi )i<n are i.i.d. uniform on [0, 1]2, then (for N > 2) 


E inf ) d(X;j, Yx(i)) <LVNiogN , (4.36) 
sa 
i<N 


where the infimum is over all permutations of {1,...,N} and where d is the 
Euclidean distance. 


The term JN is just a scaling effect. There are N terms d(X;, Yz(i)), each of 
which should be about 1/./N. The non-trivial part of the theorem is the factor 


8 Please remember this notation which is used throughout this chapter. 
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Jlog N. In Sect. 4.6, we shall show that (4.36) can be reversed, i.e., 


1 
E inf) d(Xj, Ya) = zVvN logN . (4.37) 
rs 


i<N 


In order to understand that the bound (4.36) is not trivial, you can study the 
following greedy matching algorithm which was shown to me by Yash Kanoria: 


Exercise 4.5.2 For each n > 0, consider the partition H,, of [0, 1]? into 27” equal 
squares. Consider the largest integer m9 with 27”° < N, and proceed as follows: For 
each small square in H,,,, match as many as possible of the points X; with points Y; 
in the same square. Remove the points X; and the points Y; that you have matched 
this way. For the remaining points, proceed as follows: In each small square of 
Hno—1, Match as many of the remaining points X; to remaining points Y; inside the 
same square. Remove all the points X; and the points Y; that you have removed at 
this stage, and continue in this manner. Prove that the expected cost of the matching 
thus constructed is < LVN log N.? 


Let us state the “discrepancy bound” at the root of Theorem 4.5.1. Consider the class 


C of 1-Lipschitz functions on [0, 1]?, i.e., of functions f that satisfy 


vx, y €[0, 11°, |f) —f)| < dy), 


where d denotes the Euclidean distance. We denote by A the uniform measure on 
ay. 
Theorem 4.5.3 We have 

Esup| rat — f fany| NlogN . (4.38) 


eC i<N 


Research Problem 4.5.4 Prove that the following limit 


sit, Farag sg! DD — i) 


exists. 


At the present time, there does not seem to exist the beginning of a general approach 
for attacking a problem of this type, and certainly the methods of the present book 
are not appropriate for this. Quite amazingly, however, the corresponding problem 
has been solved in the case where the cost of the matching is measured by the square 


? Tt can be shown that this bound can be reversed. 
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of the distance (see [4]). The methods seem rather specific to the case of the square 
of a distance. 

Theorem 4.5.3 is obviously interesting in its own right and proving it is the goal 
of this section. Before we discuss it, let us put matchings behind us. 


Proof of Theorem 4.5.1 We recall (4.29), i.e., 


inf ) 7 4(Xi, Yay) = sup DCF (Xi) — fs) , (4.39) 


i<N SEC I<N 


and we simply write 


> (%) - f@%)) < [ren f fan) +n - f sen). 
ix<N i<N i<N 
(4.40) 


Next, we claim that 


| uran — f fa] <LVN. (4.41) 


i<N 


We recall that since (Y;);<y are evenly spread, one can cover [0, 1)? with N 
rectangles R; with disjoint interiors, such that each rectangle R; has an area 1/N and 
is such that Y; € R; C B(Y;, 10/./N). Consequently, Nf fdaA= N i<n Sr fda 
and 


[ura f ra] = [roy —n f saa 


i<N i<N 


<Dlua-w fran) 


i<N 


< oN i (fH) — FE) AA)| (4.42) 


i<N 


Since f is 1-Lipschitz and R; is of diameter < L//N, we have | f(Y;) — f(x)| < 
L//N when x € R;. This proves the claim. 
Now, using (4.39) and taking expectation, 


Einf )° d(Xi, Yaa) < LVN +E sup | ¥ (f(Xi) — i fda)| 
z eC 


i<N feC j<n 
< LVN logN 


by (4.38). o 
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4.5.1 The Long and Instructive Way 


S. Bobkov and M. Ledoux recently found [19] a magically simple proof of 
Theorem 4.5.3. We will present it in Sect. 4.5.2. This proof relies on very specific 
features, and it is unclear as to whether it will apply to other matching theorems. In 
the present section, we write a far more pedestrian (but far more instructive) proof 
with the general result Theorem 6.8.3 in mind. 

To prove Theorem 4.5.3, the overall strategy is clear. We think of the left-hand 
side as E sup rec |Z ¢|, where Z is the random variable of (4.35). We then find nice 
tail properties for these r.v.s, and we use the methods of Chap. 2. In the end (and 
because we are dealing with a deep fact), we shall have to prove some delicate 
“smallness” property of the class C. This smallness property will ultimately be 
derived from the ellipsoid theorem. The (very beautiful) strategy for the hard part of 
the estimates relies on a kind of two-dimensional version of Proposition 4.1.8 and 
is outlined on page 129. 

The class C of 1-Lipschitz function on the unit square is not small in any sense for 
the simple reason that it contains all the constant functions. However, the expression 
Vien (F(X) — 7 fda) does not change if we replace f by f + a where a is a 
constant. In particular 


=) Doren = f rar = sup | Doren = | far 
E 


i<N fEC i<N 


where we define C as the set of 1-Lipschitz functions on the unit square for which 
f(1/2, 1/2) = 0.'° The gain is that we now may hope that C is small in the 
appropriate sense. To prove Theorem 4.5.3, we will prove the following: 


Theorem 4.5.5 We have 


Esup| Dirt — f fan)| <2 NlogN . (4.43) 


feC i<Nn 
The following fundamental classical result will allow us to control the tails of the 
rv. Z ¢ of (4.35). It will be used many times. 


Lemma 4.5.6 (Bernstein’s Inequality) Let (W;)j>1 be independent r.v.s with 
EW; = 0, and consider a number a with |W;| < a for eachi. Then, for v > 0, 


2 v2 Vv 


i=l 


10 There is no real reason other than my own fancy to impose that the functions are zero right in 
the middle of the square. 
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Proof For |x| < 1, we have 


1 
le? —1—a] <x? D0 = a2(e-2) <2" 
k>2 


and thus, since EW; = 0, for a|A| < 1, we have 
|EexpaW; —1| <27EW?. 


Therefore, Eexpaw; < 1+ MEW? < exp MEW?, and thus 


Eexpa >> W; = | [Eexpawi < expa’ ) EW? : 


i>1 i>1 i=l 


Now, for 0 < A < 1/a, we have 


P(>- W; > v) < exp(—Av)E expr > W; 


i>1 i>1 


< exp (0 y Ew? — rv) : 


i>1 
If av < 2 ii EWw?, we take A = v/(2 iis EW?), obtaining a bound 


exp(—v/(4 wet EW?)). If av > 23051 Ew?, we take A = 1/a, and we note 
that 


so that Pix W; > v) < exp(— min(v?/4 os Ew?, v/2a)). Changing W; into 


—W; we obtain the same bound for POS si W; < —v). oO 
Corollary 4.5.7 For each v > 0, we have 
: v? v 
P(|Zs| =v) < fo aa rar ae) (4.45) 


where || f ||» denotes the norm of f in L(A). 

Proof We use Bernstein’s inequality with W; = f (Xi) —f fdaifi < N and W; = 
0 if i > N. We then observe that EW? < Ef? = |/f||5 and |Wi| < 2sup|/| = 
2Il fF loo. Oo 


Let us then pretend for a while that in (4.45), the bound was instead 
2exp(—v*/(4N || f II3)). Thus, we would be back to the problem we considered first, 
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bounding the supremum of a stochastic process under the increment condition (2.4), 
where the distance on C is given by d(f\, fo) = V2N|| fi — f2ll2. The first thing 
to point out is that Theorem 4.5.3 is a prime example of a natural situation where 
using covering numbers does not yield the correct result, where we recall that for a 
metric space (7, d), the covering number N(T, d, €) denotes the smallest number 
of balls of radius € that are needed to cover T. This is closely related to the fact 
that, as explained in Sect. 2.13, covering numbers do not describe well the size of 
ellipsoids. It is hard to formulate a theorem to the effect that covering numbers do 
not suffice, but the root of the problem is described in the next exercise, and a more 
precise version can be found later in Exercise 4.5.20. 


Exercise 4.5.8 Prove that for each O < « < 1 
~ 1 
log N(C, d2, €) = as (4.46) 
Le 


where d> denotes the distance in L?({0, 1]”). Hint: Consider an integer n > O, and 
divide [0, 1]* into 27” equal squares of area 2~?”. For every such square C, consider 
a number €c = +1. Consider then the function f € C such that for x € C, one 
has f(x) = €cd(x, B), where B denotes the boundary of C. There are 22" such 
functions. Prove that by appropriate choices of the signs €c, one may find at least 
exp(2”/L) functions of this type which are at mutual distance > 2~”/L. 


Since covering numbers do not suffice, we will appeal to the generic chaining, 
Theorem 2.7.2. As we will show later, in Exercise 4.5.21, we have y2(C, dz) = Oo. 
To overcome this issue, we will replace C by a sufficiently large finite subset F C C, 
for which we shall need the crucial estimate y2(F,d2) < L./log N. This will be 
done by proving that y2,2(C, dz) < oo where 2,2 is the functional of (4.5), so that 
y2,2(F , dz) < oo, and appealing to Lemma 4.1.3. < 

The main ingredient toward the control of y2,2(C, d2) is the following two- 
dimensional version of Proposition 4.1.8: 


Lemma 4.5.9 Consider the space C* of 1-Lipschitz functions on [0, 1]? which are 
zero on the boundary of (0, 1]?. Then y2,2(C*, dr) < ov. 


Proof We represent C* as a subset of an ellipsoid using the Fourier transform. 
The Fourier transform associates with each function f on L?({0, 1]*) the complex 
numbers Cp,q(f) given by 


ecn= i i _ Plc, x2) expQiz(pxr + x2))duider (4.47) 
[0,1] 
The Plancherel formula 


Ifl2=( DO lena) (4.48) 


p.qeZ 
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asserts that Fourier transform is an isometry, so that if 


D= {(Cp.q(f)) p,.qeZ ; fe c*)} ; 


it suffices to show that y2,2(D,d) < oo where d is the distance in the complex 
Hilbert space Sirs (Z x Z). Using (4.47) and integration by parts, we get 


—lgpty,(f)= ral se) , 


Using (4.48) for df/dx and since ||0f/0x||2 < 1, we get ee? P leper 
< 1/4m?. Proceeding similarly for f/dy, we get 


Dee = { (cra) E CZ x Z); |cool < 1, Ss (p? + q*)lep,ql” < 1} . 
p.qeZ 


We ba = Coyle number Cp.g as a pair (Xp,q, Yp,q) Of real numbers and 
lep,ql” = Xp.q + Yp,q> SO that 


E= {(pq), pq) EC“ x Dx OZ Z); 


xootyoos!, Yo W+ Qt yq) <1}. 4.49) 
p.qeZ 


For u > 1, we have 
card {(p, q) eZxZ; pt+q’ <u} 2 Oy ls La. 


We then deduce from Corollary 4.1.7 that y2,2(€, d) < ow. oO 
Proposition 4.5.10 We have 2,2(C, dz) < ©. 


I am grateful to R. van Handel who showed me the following simple arguments, 
which replaces pages of gritty work in [132]. The basic idea is to deduce this from 
Lemma 4.5.9, essentially by showing that Cisa Lipschitz image of a subset of C* 
or more exactly of the clone considered in the next lemma. 


Lemma 4.5.11 The set C* of 1-Lipschitz functions on [—1,2]° which are zero on 
the boundary of this set satisfies y2,(C*, d*) < oo where d° is the distance induced 
by L?({—1, 2)’, da). 


Proof This should be obvious form Lemma 4.5.9; we just perform the same 
construction on two squares of different sizes, [0, 1]? and [—1, 2]. oO 


Lemma 4.5.12 Each 1-Lipschitz function f € C is the restriction to [0, 1]? ofa 
function f* of C*. 
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Proof A function f € Cc may be extended to a 1-Lipschitz function f on R? by the 
formula f(y) = inf, <0, 172 f(x) + d(x, y). Since f(1/2, 1/2) = 0 by definition of 
C and since f is 1-Lipschitz, then | f(x)| < 1/2 < 1 forx € [0, 1]?. The function 
f*(y) = min(f (y), d(y, R? \ [-1, 2])) is 1-Lipschitz. Since each point of [0, 1] 
is at distance > 1 of R? \ [-1, a, ft coincides with f on [0, 17, and it is zero on 
the boundary of [—1, 2)’. oO 


Proof of Proposition 4.5.10 To each function f of C*, we associate its restriction 
v(f) to [0, 1]?. Since the map 9 is a contraction, by Lemma 4.5.11, we have 
y2,2(p(C*)) < 00, and by Lemma 4.5.12, we have C C y(C*). oO 


Let us now come back to Earth and deal with the actual bound (4.45). For 
this, we develop an appropriate version of Theorem 2.7.2. It will be used many 
times. The ease with which one deals with two distances is remarkable. The proof 
of the theorem contains a principle which will be used many times: if we have 
two admissible sequences of partitions such that for each of them, the sets of the 
partition as small in a certain sense, then we can construct an admissible sequence 
of partitions whose sets are small in both senses. 


Theorem 4.5.13 Consider a set T provided with two distances d\ and dz. Consider 
a centered process (X;)teT which satisfies 


Vs,teT, Vu>0, 


2 


F u u 
Then 
E sup |Xs — X1| < L(w(T, di) + y2(T, da) . (4.51) 
s,teT 


This theorem will be applied when d is the €,, distance, but it sounds funny, when 
considering two distances, to call them d2 and doo. 


Proof We denote by A;(A) the diameter of the set A for d;. We consider an 
admissible sequence (By )n>0 such that!! 


VreT, a 2” Ai (Brl(t)) < 2n(T, di) (4.52) 


n>0 


'l The factor 2 in the right-hand side below is just in case the infimum over all partitions is not 
attained. 
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and an admissible sequence (C;,)y>0 such that 


wee T, 5) 2" Ax (C(t) < 2y2(T, dr) . (4.53) 


n>0 


Here B,(t) is the unique element of 8, that contains t (etc.). We define partitions 
Ay, of T as follows: we set Ag = {T}, and, for n > 1, we define A, as the partition 
generated by 6,_; and C,_1, i.e., the partition that consists of the sets B N C for 
B € B,_; andC € C,_1. Thus card A, < NG < Ny, and the sequence (A,,) is 
admissible. We then choose for each n > 0 a set T,, such that card 7,, < N, which 
meets all the sets in A,. It is convenient to reformulate (4.50) as follows: when 


u > 1, we have 
Vs,t €T , P(\Xs — X;| = u?di(s, t) + udp(s, t)) < 2exp(—u’) . 
We then copy the proof of (2.34), replacing (2.31) by 
Vt, [Xay(t) — Xap S M2" tn (0), An—1 (1) + 42" do (ta), Mn—-1 0). 


Exercise 4.5.14 The purpose of this exercise is to deduce Theorem 4.5.13 from 
Theorem 2.7.14. 


(a) Prove that if for some numbers A, B > Oar.v. Y > 0 satisfies 


2 


P(Y > u) < 2exp (— min (=: =)) j 


then for p > 1, we have ||Y ||» < L(A./p + Bp). 

(b) We denote by D,, (A) the diameter of a subset A of T for the distance 6,(s, t) = 
|X; — X;|lor. Prove that under the conditions of Theorem 4.5.13, there exists 
an admissible sequence of partitions (A,,) such that 


sup) Dn(An(t)) < LWT, di) + 2(T, a2)) - (4.54) 


teT 130 


Exercise 4.5.15 Consider a space T equipped with two different distances d; and 
d. Prove that 


y2(T, dy + dz) < LQ2(T, di) + y2(T, d2)) . (4.55) 


We can now state a general bound, from which we will deduce Theorem 4.5.3. 


Theorem 4.5.16 Consider a class F of functions on [0, 1]*, and assume that 0 € 
F. Then 


E sup | }°(f(Xi) - / fda)| < L(V Ny2(F, do) + V1(F,doo)), (4.56) 
fe 


i<N 
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where dz and doo are the distances induced on F by the norms of L? and TE, 
respectively. 


Proof Combining Corollary 4.5.7 with Theorem 4.5.13, we get, since 0 € F and 
Zo = 0, 


E sup |Z¢|<E sup |Zp—Zyp| < L(y2(F,2VNd2) +i (F, 4doo)). (4.57) 
feF A fEF 
Finally, y2(F, 2VNd2) = 2VNy2(F, dz) and 1\(F, 4doo) = 41 (F, doo). Oo 


Exercise 4.5.17 Try to prove (4.25) now. Hint: Consider F = {1jo,4/nj; k < N}. 
Use Exercise 2.7.5 and entropy numbers. 


In the situation which interests us, there will plenty of room to control the term 
yvi(F, doo), and this term is a lower-order term, which can be considered as a simple 
nuisance. For this term, entropy numbers suffice. To control these, we first state a 
general principle, which was already known to Kolmogorov. 


Lemma 4.5.18 Consider a metric space (U,d), and assume that for certain 
numbers B anda > 1 and eachO < € < B, we have 


N(U,d,€) < icy (4.58) 


Consider the set B of 1-Lipschitz functions f on U with || f \loo < B. Then for each 
€ > 0, we have 
Bva 
log N(B, dso. €) = K(@)(=) (4.59) 
€ 
where K (a) depends only on a. In particular, 


€n(B, do) < K(a)B2-"/" . (4.60) 


Proof By homogeneity, we may and do assume that B = 1. Using (4.58) for € = 
2—", for each n > 0, consider a set V, C U with card V, < 2” such that any point 
of U is within distance 2~” of a point of V,. We define on B the distance d, by 
dn(f, g) = maxyey, | f(x) — g(x)|. We prove first that 


doo f, 8) <27-"*! + da(f, 8) - (4.61) 
Indeed, for any x € U, we can find y € V, with d(x, y) < 2~” and then | f(x) — 
g(x)| < 2-41 + 1F0) — 8) 52°? + duff, 8). 
Denote by W,,(f, r) the ball for d, of center f and radius r. We claim that 


Waitg 2 Ve We) Oe). (4.62) 
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Indeed, using (4.61) for n — 1 rather than n, we see that d,(f, g) < dwolf,g) < 
2g ee 2 forge Waa i, 2 Ps 
Next, we claim that 


NWF a er. (4.63) 


Since dn(f, 8) = ll@n() — Gn(8)lloo where gn(f) = (f))xev,, we are actually 
working here in R°4, and (4.63) is a consequence of (2.47): in R°4”, we are 
covering a ball of radius 2~”+3 by balls of radius 2~”. 

Covering B by N(B,dy—1,27"*!) balls Wp—1(f,27"t!) and hence by 
N(B, dy_1,2~"*') balls Waals, 2-"+3) and then covering each of these by 
N(W,(f, 27°79), dns 27") < L&*4 balls for dy of radius 2”, we obtain 


N62 2 NG a2). (4.64) 


Since card V,, = 2”, iteration of (4.64) proves that log N(B, d,,2~") < K2°”. 
Finally, (4.61) implies that 


log N(B, doo, 2~"*”) < log N(B, dn—1,27""!) < K2™" 


and concludes the proof. oO 
We apply the previous lemma to U = [0, 1]? which obviously satisfies (4.58) for 

a = 2, so that (4.60) implies that for n > 0, 
Aiaveio (4.65) 


Proposition 4.5.19 We have 


Esup| scx — f fan| <L/NlogN . (4.66) 


fEeC i<N 


An interesting feature of this proof is that it does not work to try to use (4.56) 
directly. Rather we will use (4.56) for an appropriate subset T of C, which can be 
thought of as the “main part” of C, and for the “rest” of Cc, we will use other (and 
much cruder) bounds. This method is not artificial. As we will learn much later, in 
Theorem 6.8.3, when properly used, it always yields the best possible estimates. 


Proof Consider the largest integer m with 2~" > 1/N. By (4.65), we may find a 
subset T of C with card T < Nj» and 


VPEC, dl T e127 aL. 
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Thus for each f € C, consider f €T with do(f, f) < L/VN. Then 
Z| <|Zplt+ Zp — Zl =|Zpl+1Zp_fl S|Zpl + LVN, 


where we have used the obvious inequality IZ -_#l < 2doo(f, f). Since f € T, we 
obtain — 


E sup |Zy| < a [Ze + LVN. (4.67) 
fee 


To prove (4.66), it suffices to show that 


E sup |Z¢| < L/NlogN . (4.68) 
feT 


Proposition 4.5.10 and Lemma 4.1.3 imply y2(T, d2) < L./m < L./log N. Now, 
as in (2.56), we have 


Vi(T, doo) < L >" 2"en(T, doo) « 


n>0 


Since en(T, doo) = 0 forn > m, (4.65) yields y(T, doo) < L2”"/? < LVN. 


Thus (4.68) follows from Theorem 4.5.16 and this completes the proof. Oo 


Exercise 4.5.20 Use Exercise 4.5.8 to prove that Dudley’s bound cannot yield 
better than the estimate y2(T, d2) < Llog N. 


Exercise 4.5.21 Assuming y2(C*,d2) < oo, show that the previous arguments 
prove that 


E sup | do £%) — f far < LVN(1+ y2(C*, d)) - 


fec* i<N 


Comparing with (4.78), conclude that y2(C*, dz) = oo. Convince yourself that the 
separated trees implicitly constructed in the proof of (4.78) also witness this. 


Exercise 4.5.22 Suppose now that you are in dimension d = 3. Prove that 
E sup re@ | Di<n f(X;) — f fdal < LN?/>, Hint: According to Lemma 4.5.18, 
we have ey @ doo) < L27 "/3 This is the only estimate you need, using the trivial 
fact that e, (C, dx) < en C, doo). 


Exercise 4.5.23 Consider the space T = {0,1} provided with the distance 
d(t,t’) = 2-4/2, where j = min{i > 1;4 4 t'} for t = (t;)i>1. This space 
somewhat resembles the unit square, in the sense that N(T, d, €) < Le~? fore <1. 
Prove that if (X;);<y are i.i.d. uniformly distributed in T and (Y;);<y are uniformly 
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spread (in a manner which is left to the reader to define precisely), then 


Einf )°d(Xj, Yay) < LVNlogn , (4.69) 
” i<N 
where the infimum is over all the permutations of {1,..., N}. Hint: You can do this 


from scratch, and for this, covering numbers suffice, e.g., in the form of (4.59). The 
method of Exercise 4.5.2 also works here. In Exercise 4.6.8, you will be asked to 
prove that this bound is of the correct order. 


4.5.2 The Short and Magic Way 


We now start studying the Bobkov-Ledoux approach [19] which considerably 
simplifies previous results such as the following one: 


Theorem 4.5.24 ([110]) Consider the class C* of 1-Lipschitz functions on [0, 1}* 
which are zero on the boundary of [0, 1]?. Consider points (zj)i<n in [0, 1]? and 
standard independent Gaussian r.v.s g;. Then 


E sup | a si flea| <LNilogN . (4.70) 
feC* jen 


It should be obvious from Lemma 4.5.12 that in the previous result, one may 
replace C* by C of Theorem 4.5.5. The following improves on Theorem 4.5.3: 


Corollary 4.5.25 Consider an independent sequence (Xj)i<n of r.v.s valued in 
[0, 172. (It is not assumed that these r.v.s have the same distribution.) Then 


Esup| )\(f(%i) — Ef (Xi))| < LYN log . (4.71) 
fee i<N 


Proof Consider i.i.d. standard Gaussian r.v.s g;. Taking first expectation in the g; 
given the X;, it follows from Theorem 4.5.24 (or more accurately from the version 
of this theorem for the class C) that E SUP re | Den gi f (Xi) > < LN log N. The 
Cauchy-Schwarz inequality yields E sup ¢-@| ij<n Sif (Xi)| < L./N log N. We 
will learn later the simple tools which allow to deduce (4.71) from this inequality, 
in particular the Giné-Zinn inequalities and specifically (11.35) (which has to be 
combined with (6.6)). oO 


Let us consider an integer n > WN and the set 


G={(k/n,€/n); 0<k,0<n-l}. 
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Using the fact that the functions f € C* are 1-Lipschitz and replacing each point 
z; by the closest point in G (which is at distance < /2/n < L/\/N of 2), the 
following is obvious: 


Lemma 4.5.26 To prove Theorem 4.5.24, we may assume that each z; € G. 


Let us define an ellipsoid € in RY asasetE = (es azug} where (ux)x>1 iS a 
given sequence in R% and where (a)x>1 varies over all the possible sequences with 
Vet a? < 1.!? Fort € RY, we write X; = yen tigi as usual. 


Lemma 4.5.27 We have 


2 2 
E sup |X;/* < )) lluell . 
teE kl 


Proof This is exactly the same argument as to prove (2.157). Fort = )>,., au € 
E we have X; = 0, aXy, so that by the Cauchy-Schwarz inequality we have 
GP = eet Bens and the result follows from taking the supremum in ¢ and 
expectation since E|X.,|7 = |leell?. Oo 


To prove Theorem 4.5.24, we will show that the set {(f(zi))i<w; f € C*} isa 
subset of an appropriate ellipsoid. 

For this, we identify G with the group Z, x Z, where Z, = Z/nZ, with the 
idea to use Fourier analysis in G, keeping in mind that a function on [0, 1]* which is 
zero on the boundary of this set will give rise to a function on Z, x Z,. Consider the 
elements t; = (1,0) and t2 = (0, 1) of G. Fora function f : G > R, we define 
the functions fi(t) = f(t +1) — f(t) and fa(t) = f(t + tm) — f(t) and the 
class C of functions G > R which satisfy 


WreG, |f@) 215 ¥eeG? |fA@| =1/n; |fG@)| =1/n. (4.72) 


Thus, seeing the functions on C* as functions on G, they belong to C. Let us 
denote by G the set of characters x on G.!? The Fourier transform 7 of a function 
f on G is the function 7 onG given by OO = (card G)~! ee Ff (t)x (7) where 
we recall that | x (t)| = 1. One then has the Fourier expansion 


7= DIOR: (4.73) 


xeG 


The name is justified, a bit of algebra allows one to show that such a set is an ellipsoid in the 
usual sense, but we do not need that. 


'3 A character of a group G is a group homomorphism from G to the unit circle. 
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and the Plancherel formula 
i 2 A 2 
es Y = 5 : 4.74 
cad G | f(v)| 2 FOO! ( ) 
tTEG xeG 


The key to the argument is the following: 


Proposition 4.5.28 There exist positive numbers (c(x)) xeG such that 


< Llogn (4.75) 


and 
vf eC, DoeWIfW? <1. (4.76) 
xeG 


We start the preparations for the proof of Proposition 4.5.28. The following lemma 
performs integration by parts: 

Lemma 4.5.29 For each function f on G and every x € G, we have fi (x) = 
Ott) — DFO) 


Proof Since Veg f(t +11) x(t) = Veg f(t) x(t — 11) by change of variable, 
we have 


(card G) fi(x) = DO (F(t + 11) — F@) x(t) = DS F@(x(t — 11) — x()) 


tEG tEG 
= (x(t) — I) » F(t) x(t) = (x(t1) — D(card G) FX) , 
teG 
where we have used in the third equality that x(t — t1) = x(t)x(—1). oO 


Corollary 4.5.30 For f €C, we have 


2 
9, . 


d(x) — 1? + x(n) - IDIFOOP < 


xeG 


(4.77) 


~) 


Proof Using Lemma 4.5.29 and then the Plancherel formula (4.74) and (4.72), we 
obtain 


Z 7 1 1 
Yo Ix(-u) — 171FGOP = do IAGO? = cmag 2 Or re 


xeG xeG 


and we proceed similarly for t2. oO 
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Proof of Proposition 4.5.28 For x € G, let us set c(x) = 1/2 if x is the constant 
character xo equal to 1, and otherwise 


ii) 


n 


ex) = F(x) - 1? +1x(-m2) — 1°). 


Then, since lf(xo)| < < 1 because | f(t)| < 1 for each t and using (4.77) in the 
second inequality, 


<1, 


~ 1 4 = 2 
Me eGOlFOoP = slfaoP+ DF eGolfoor <5+ = 


xeG xEG. X#XO 


+> 
and this proves (4.76). To prove (4.75), we use that Gis exactly the set of characters 
of the type Xp.q(a,b) = exp(2im(ap + bq)/n) where 0 < p,q < n — 1. Thus 


Xp.q(—-T1) = exp(—2im p/n) andxp.q(—T2) = exp(—2imq/n). Now, for0 < x < 
1, we have |1 — exp(—2izx)| > min(x, | — x) so that 


1, 1, 
IXp.q(-T1) — l= Fy min.» — P)s |Xp.q(-t2) - l= Pl ok =) 4 


Thus 


ye 1 
xe 


4 e(x) ~ exo) min(p,n — p)* + min(q,n — q)? ’ 


where the sum is over 0 < p,q <n—1land (p,q) 4 (0, 0). Distinguishing whether 
Pp <n/2 or p => n/2 (and similarly for g), we obtain 


1 1 


min(p,n — p)? + min(q,n — q)2 p-t+¢q 


where the sum is over the same set and this sum is < Llogn. oO 


Proof of Theorem 4.5.24 We write (4.73) as f = Yycq@oyx/V/c(X) where 

= VeQor (x) so that Dee lay |? < 1 by (4.76). Now we come back to 
real numbers by taking the real part of the identity f = >), -g@ayx//c(). 
This gives an equality of the type f = )>,<a@(ax’ + Byx")/Vc(X) where 
Vole, He) < land |x’, |x| < 1. Thatis, the set {(f(zi))i<w; f € C*} 
is a subset of the ellipsoid € = {0,1 axuks Dox a? < 1}, where the family (ux) 


of points of R® consists of the points | (x (zi) /VeOO)i<n and (x” (zi) //eQ))i<n 
where x takes all possible values in G. For such a ux, we have |uz(z)| < ie 


so that || 1% ||? = DVi<n uk(zi)* < N/c(x), and then by (4.75), we have }°, alee 
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LN logn. Finally we apply Lemma 4.5.27, and we take for n the smallest integer 
> JN. o 


Exercise 4.5.31 Let us denote by v the uniform measure on G and by d, the 
distance in the space L?(v). Prove that y2(C*,d,) > /logn/L. Warning: This is 
not so easy, and the solution is not provided. Hint: Make sure you understand the 
previous chapter, and construct an appropriate tree. The ingredients on how to build 
that tree are contained in the proof given in the next section, and Sect. 3.2 should 
also be useful. You may assume that N is a power of 2 to save technical work. 
Furthermore, you may also look at [132] where trees were explicitly used. 
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Recall that C* denotes the class of 1-Lipschitz functions on the unit square which 
are zero on the boundary of the square. We shall prove the following, where (X;)i<w 
are i.id. in [0, 1]?: 


Theorem 4.6.1 We have 


E sup eS (f (Xi) - | F)| > | /NiogN (4.78) 


feC* jen 


Since 


Es rue f fa| S sup | SOF — £%))| 
3 ei 2 * 


i<N feC™ jen 


’ 


+ sup | Doran — f far) 
fec* 


i<N 


taking expectation and using (4.41), it follows from (4.78) that if the points Y; are 
evenly spread, then (provided N > L) 


1 
E sup | )-(f(Xi) — fD)| = FV Nowy 


feC* jen 


and since C* C C, the duality formula (4.29) implies that the expected cost of 
matching the points X; and the points Y; is at least ./N log N/L. 

The proof of Theorem 4.6.1 occupies this entire section and starts now. The 
beautiful argument we present goes back to [3]. We can expect that this proof is non- 
trivial. To explain why, let us recall the set T used in the proof of Proposition 4.5.19. 
Analysis of the proof of that proposition leads us to guess that the reason why the 
bound it provides cannot be improved is that y2(T, d2) is actually of order ./log N 
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(and not of a smaller order). So a proof of (4.78) must contain a proof that this 
is the case. In the previous chapter, we learned a technique to prove such results, 
the construction of “trees”. Not surprisingly, our proof implicitly uses a tree which 
witnesses just this, somewhat similar to the tree we constructed in Sect. 3.2.!4 

We may assume WN large, and we consider a number r € N which is a small 
proportion of log N, say, r ~ (logN)/100.'° The structure of the proof is to 
recursively construct for k < r certain (random) functions /f, such that for any 
qsr 


> fx is 1-Lipschitz (4.79) 
k<q 


and for each k <r, 


ED (feXi - i fdr) = ie (4.80) 


i<N 


The function f* = )°, <, fk is then 1-Lipschitz and satisfies 


Ere - [ fra) > sell 


i<N 


This completes the proof of (4.78). The function f; looks to what happens at scale 
2-*, and (4.80) states that each such scale 2~* contributes about equally to the final 
result. 

Following the details of the construction is not that difficult, despite the fact that 
ensuring (4.79) requires technical work. What is more difficult is to see why one 
makes such choices as we do. There is no magic there, making the right choices 
means that we have understood which aspect of the geometric complexity of the 
class C* is relevant here. 

The main idea behind the construction of the function f; is that if we divide 
[0, 1/7 into little squares of side 2-*, in each of these little squares, there is some 
irregularity of the distribution of the X;. The function f; is a sum of terms, each 
corresponding to one of the little squares (see (4.90)). It is designed to, in a sense, 
add the irregularities over these different squares. 


'4 Tt should also be useful to solve Exercise 4.5.31. 

'5 We absolutely need for the proof a number r which is a proportion of log N, and taking a small 
proportion gives us some room. More specifically, each square of side 2~” will have an area larger 
than, say, 1/./N, so that it will typically contain many points X;, as we will use when we perform 
normal approximation. 
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Fig. 4.1 The graphs of f;,; and f,,3 


The functions f; will be built out of simple functions which we describe now. 
For 1 < k <rand1 < @ < 2*, we consider the function fie on [0, 1] defined as 
follows: 


O unless x € [(€ — 1)2-*, 22-*[ 
ROOST) texe (C12 CH172 (4.81) 
=1 forx e€(@-1/2)2+*,2-{. 


We define (Fig. 4.1) 


feels) = i) fl gQ)dy . (4.82) 


We now list a few useful properties of these functions. In these formulas, ||.||2 
denotes the norm in L?({0, 1]), etc. The proofs of these assertions are completely 
straightforward and better left to the reader. 


Lemma 4.6.2 The following holds true: 


The family (fio is orthogonal in L?({0, 1]). (4.83) 
feels =2~* (4.84) 
lfc 2 (4.85) 
fx. is zero outside \(€ — 1)2-*, e2-*[ (4.86) 
(faksS2es (4.87) 
Ifzelloo =15 Wfeelloo=2*". (4.88) 
Wel = 52. (4.89) 

12 
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The functions f; will be of the type 


gk-5 


fk = ie 


Yo kee fe ® fae» (4.90) 


£,0/<2k 


where fie ® fe (x,y) = f.c(x) fre (y) and z,¢,¢ € {0, 1, —1}. Note that fx, ¢ ® 
Sk.e 18 zero outside the little square [(¢ — i2a-*, 02-* [x [(e _ H2-*, ’/2-*| and 
that these little squares are disjoint as € and ¢’ vary. The term zx.¢.¢ fk,e ® fk.e is 
designed to take advantage of the irregularity of the distribution of the X; in the 
corresponding little square. The problem of course is to choose the numbers 2; ¢¢. 
There are two different ideas here: First, as a technical device to ensure (4.79), zx. ¢,¢/ 
may be set to zero. This will happen on a few little squares, those where are getting 
dangerously close to violate this condition (4.79). The second idea is that we will 
adjust the signs of zx,¢,¢’ in a way that the contributions of the different little squares 
add properly (rather than canceling each other). 

Let us now explain the scaling term 2‘~*/,/r in (4.90). The coefficient 2~> is 
just a small numerical constant ensuring that we have enough room. The idea of 
the term 2«/,/r is that the partial derivatives of f; will be of order 1/,/r. Taking 
a sum of a most r such terms and taking cancellations in effect will give us partial 
derivatives which at most of the points are < 1. This is formally expressed in the 
next lemma. So, such sums are not necessarily 1-Lipschitz, but are pretty close to 
being so, and some minor tweaking will ensure that they are. 


Lemma 4.6.3 Consider q < r. Consider a function of the type f = a oe The 
where fi is given by (4.90) and where zx,¢,¢ € {0, 1, —1}. Then 


4.91 
ll. = 490 
Proof First we note that 
af pk- =3 
ape = Le FF Yo eee fie fe) » (4.92) 
k<q £,0/<2k 


which we rewrite as 


“Lex, y= a Yo a0) Fie) § 


k<q e<2k 


where ax. ¢(y) = ev<ok Zx,0.0 Ske (y). Using (4.83), we obtain 


[Qe-r= Ye ae Wield - 


k<q £<2k 


2k— 10 
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Since ] ees 1 and since the functions ( fk,¢)¢<2« have disjoint support, we have 


axe(y)* < Deer<2k fee)’, so that 


ic ryan ee > Wilber. 


£,0/<2k 
Integrating in y and using (4.84) and (4.89) yield 


2k—10 2k —10 
el< ar = far 22” : 
Ox r 12° ¥ oto = 


Naturally, we have the same bound for ||df/dy||2. These bounds do not imply that 
f is 1-Lipschitz, but they imply that it is 1-Lipschitz “most of the time”. 

We construct the functions f; recursively. Having constructed f|,..., fq, let 
f= <q fx, and assume that it is 1-Lipschitz. We will construct fg+1 of the 
type (4.90) by choosing the coefficients z7+1,¢,¢. Let us say that a square of the 


type 
[(€ — 1)2~4, €2°-4[x[(’ — 1)274, 2-49 (4.93) 


for 1 < £, &! < 24 is a q-square. There are 274 such q-squares. 


Definition 4.6.4 We say that a (q + 1)-square is dangerous if it contains a point 
for which either |df/dx| > 1/2 or |0f/dy| => 1/2. We say that it is safe if it is not 
dangerous. 


The danger is that on this square (4.93), the function f + f,41 might not be 1- 
Lipschitz. 


Lemma 4.6.5 At most half of the (q + 1)-squares are dangerous, so at least half of 
the (q + 1)-squares are safe. 


This lemma is a consequence of the fact that “f is 1-Lipschitz most of the time.” 
The proof is a bit technical, so we delay it to the end of the section. 

The following is also a bit technical but is certainly expected. It will also be 
proved later. 


Lemma 4.6.6 If Z,41,¢,¢° = 0 whenever the corresponding (q + 1)-square (4.93) 
is dangerous, then f + fq+1 is 1-Lipschitz. 


We now complete the construction of the function f;41. For a dangerous square, 
we set Zg+1,¢,e/ = 0. Let us define 


hee (x) = fotie ® Satie) — / Satie @ fg+ieda . 
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Using (4.89) and (4.87), we obtain 


Wreellz = 2-4/L (4.94) 
Let us define then 
Dee = Y> hee (Xi) - (4.95) 
i<N 
For a safe square, we choose Zg+1,¢,¢/ = +1 such that 


Zg41,0,0 Dee = |Deel - 
Thus, if 
i es 


fq+l = aaa >, Zg41,0,e Sgt e ® forte 


0,0/<2494+1 


we have 


29-4 
Y (test ~ f ford) =e Deel (4.96) 


i<N safe 


where the sum is over all values of (€, €’) such that the corresponding square (4.93) 
is safe. 

We turn to the proof of (4.80) and for this we estimate E }°.,,. | De,e|. An obvious 
obstacle to perform this estimate is that the r.v.s Dy ¢ are not independent of the set 
of safe squares. But we know that at least half of the squares are safe, so we can 
bound below >>... |De.e’| by the sum of the 277+! smallest among the 274+? r.v.s 
|De.c'|- 

Let us estimate ED gs By definition, (4.95) Deg is a sum )°;—y hee (Xi) of 
independent centered r.v.s so that ED? » =N\lhee le and using (4.94), we obtain 
an estimate 


safe 


ED? » =2 4N/L.. (4.97) 


Let us then pretend for a moment that the r.v.s De ¢ are Gaussian and independent 
as €, ¢’ vary. For a Gaussian rv. g, we have P(|g| > (Eg?)!/*/100) > 7/8. Then for 
each @, €’, we have |D¢,¢"| = 2-34./N/L with probability > 7/8. In other words, 
the rv. Ye go = Vip, |>2-41VN/L satisfies EY? ¢° > 7/8. Then Bernstein’s inequality 
shows that with overwhelming probability, at least 3/4 of these variables equal 1. 
For further use let us state the following more general principle: Considering M 
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independentr.v.s Z; € {0, 1} with P(Z; = 1) = a; = EZ;, then for u > 0, we have 


P(| > (Zi —ai)| = uM) < 2exp(—Mu2/L) , (4.98) 
i<M 


and in particular P(});-y Zi < o;<y ai — Mu) < 2exp(—Mu?/L). 

Thus, |De.e| = 2-74,/N/L for at least 3/4 of the squares, so that at least 1/4 
of the squares are both safe and satisfy this inequality. Consequently, it follows as 
desired from (4.96) that (4.80) holds (for g + 1 rather than q). 

It is not exactly true that the r.v.s Dy » are independent and Gaussian. Standard 
techniques exist to take care of this, namely, Poissonization and normal approx- 
imation. There is all the room in the world because r < (log N)/100. As these 
considerations are not related to the rest of the material of this work, they are better 
omitted. 

We now turn to the proofs of Lemmas 4.6.5 and 4.6.6. The next lemma prepares 
for these proofs. 


Lemma 4.6.7 Consider q < r and a function of the type f = nee Sk, where fi 
is given by (4.90) and where zx,¢,¢ € {0, 1, —1}. Then 


(4.99) 
Proof We have 


Al- = Pea 


Qk-5 


ds zk,e.e Sie ® Ste] S ye — > fie ® fuel - 


l0l< k<q €,0'<2k 


The functions f; ,® f; » have disjoint support and by the first part of (4.88) | fj ¢@ 
fea lS 1 Also, Dyeg 2" 5 20". Oo 


Proof of Lemma 4.6.5 We will observe from the definition that all functions f; , 


for k < q are constant on the intervals [€2~9~!, (¢ + 1)2~4~![. Thus according 
to (4.92), on a (q + 1)-square, 0f/0x does not depend on x. If (x, y) and (x’, y’) 
belong to the same (g + 1)-square, then 


a 
(x, y)= a, y'). (4.100) 
x 


Moreover, |y — y’| < 2-47! so that (4.99) implies 


7 
il 
NY 
| 
a 


a, y- He, y)\<ly-y' 
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and combining with (4.100), we obtain 


af a6 
ce Sie We 
In particular if a (¢+1)-square contains a point at which |df/dx| > 1/2, then at each 
point of this square, we have |df/dx| > 1/2 — 2~°/,/r > 1/4. The proportion w 
of (q + 1)-squares with this property satisfies a(1/4)* < |Of/Ax||5 < < 27'?, where 
we have used (4.91) in the last inequality. This implies that at most a proportion 
2-8 of (q + 1)-squares can contain a point with |df/dx| > 1/2. Repeating the 
same argument for 0f/dy shows that as desired at most half of (¢ + 1)-squares are 
dangerous. Oo 


at, y)- o 


Proof of Lemma 4.6.6 To ensure that g := f + fg+1 is 1-Lipschitz, it suffices to 
ensure that it is 1-Lipschitz on each (g + 1)-square. When the square is dangerous, 
fq+i = 0 on this square by construction, and g is 1-Lipschitz on it because there 
g = f and f is 1-Lipschitz. 

When the square is safe, everywhere on the square we have |df/dx| < 1/2 and 
|0f/dy| < 1/2. Now the second part of (4.88) implies 


= 


/ 
|< a | = =|— ie © SS catiee fie ® Satie e 


€,€'<249+1 


N 
nn 
a 


and 


1 
IZ coal a = =|— = ye Zq4+1,0,0 fgtl.e ® fie e Sse 
ae IT Hoo 25./r 


£,0'<29+1 


where we have used that the elements of the sum have disjoint supports. So we are 
certain that at each point of a safe square we have |0g/0x| < 1/./2 and |dg/dy| < 
1/,/2 and hence that g is 1-Lipschitz on a safe square. Oo 


Exercise 4.6.8 This is a continuation of Exercise 4.5.23. Adapt the method you 
learned in this section to prove that the bound (4.69) is of the correct order. 
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Theorem 4.7.1 ((55]) Jf the points (Yj)i<n are evenly spread and if (Xi)i<n 
are i.i.d. uniform over [0, 177, then (for N > 2), with probability at least 1 — 
Lexp(—(log N)*/?/L), we have 


(log N)*/4 
inf sup d(X;, Yz()) < L——=—_ . 


4.101 
moe 5; ( i) 
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In particular 


E inf sup d(X;, Y, gay (4.102) 
ee ag 7 
To deduce (4.102) from (4.101), one simply uses any matching in the (rare) event 

that (4.101) fails. We shall prove in Sect.4.8 that the inequality (4.102) can be 

reversed. A close cousin of this theorem can be found in Appendix A. 

A first simple idea is that to prove Theorem 4.7.1, we do not care about what 
happens at a scale smaller than (log N)*/4/./N. Therefore, consider the largest 
integer £, with 2~"' > (log N)*/4//N (so that in particular 2"' < ./N). We divide 
[0, 1]? into little squares of side 2~"'. For each such square, we are interested in how 
many points (X;) it contains, but we do not care where these points are located in 
the square. We shall deduce Theorem 4.7.1 from a discrepancy theorem for a certain 
class of functions.!° What we really have in mind is the class of functions which are 
indicators of a union A of little squares with sides of length 2~' and such that the 
boundary of A has a given length. It turns out that we shall have to parametrize the 
boundaries of these sets by curves, so it is convenient to turn things around and to 
consider the class of sets A that are the interiors of curves of given length. 

To make things precise, let us define the grid G of [0, 1]* of mesh width 2~"' by 


G = {(x1, x2) € [0, 1]? ; 2x1 e Nor 2x) EN}. 


A vertex of the grid is a point (x1, x2) € [0, 1]? with 2x, € Nand 2"\x. EN. 
There are (2°! +1)? vertices. An edge of the grid is the segment between two vertices 
that are at distance 2~"' of each other. A square of the grid is a square of side 2—" 
whose edges are edges of the grid. Thus, an edge of the grid is a subset of the grid, 
but a square of the grid is not a subset of the grid (see Fig. 4.2). 

A curve is the image of a continuous map g : [0, 1] > R?. We say that the curve 
is a simple curve if it is one-to-one on [0, I[. We say that the curve is traced on G 
if g([0, 1]) C G and that it is closed if g(0) = g(1). If C is a closed simple curve 
in R?, the set R* \ C has two connected components. One of these is bounded. It is 


called the interior of C and is denoted by C. 

The proof of Theorem 4.7.1 has a probabilistic part (the hard one) and a 
deterministic part. The probabilistic part states that with high probability the number 
of points inside a closed curve differs from its expected value by at most the length 
of the curve times L/N (log N)?/4. The deterministic part will be given at the end 
of the section and will show how to deduce Theorem 4.7.1 from Theorem 4.7.2. 


'6 This is the case for every matching theorem we prove. 
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Fig. 4.2 A square A, and 
edge e, a vertex V, anda 
simple curve C traced on G 


Theorem 4.7.2 With probability at least 1 — L exp(—(log N)?/*/L), the following 
occurs: Given any closed simple curve C traced on G, we have 


pe (12.(Xi) — MC))| < LEC)VN (log N34, (4.103) 
i<N 


where MC) is the area of C and £(C) is the length of C. 


We will reduce the proof of this theorem to the following result, which concerns 
curves of a given length going through a given vertex: 


Proposition 4.7.3. Consider a vertex t of G andk € Z. Define C(t, k) as the set 
of closed simple curves traced on G that pass through t' and have length < 2". 
Then, if —0, < k < €; +2, with probability at least 1 — L exp(—(log N)*/?/L), for 
each C € C(t, k), we have 


eS (12(Xi) = uC))| < L2*./N(log N)?/4 . (4.104) 
i<N 


It would be easy to control the left-hand side if one considered only curves with 
a simple pattern, such as boundaries of rectangles. The point, however, is that the 
curves we consider can be very complicated and the longer we allow them to be, 
the more so. Before we discuss Proposition 4.7.3 further, we show that it implies 
Theorem 4.7.2. 


'7 That is, t is an end vertex of an edge which belongs to the curve. 
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Proof of Theorem 4.7.2 Since there are at most (2°1 + 1)? < LN choices for the 
vertex T, we can assume with probability at least 


1 — L(2" + 1)°(2¢) + 4) exp(—(log N)*/?/L) > 1 — L’ exp(— (log N)*/?/L’) 
(4.105) 


that (4.104) occurs for all choices of C € C(t, k), for any t and any k with —£; < 
k<€,4+2. 

Consider a simple curve C traced on G. Bounding the length of C by the total 
length of the edges of G, we have 2~" < €(C) < 2(2% +1) < 2% +7, Then 
the smallest integer k for which £(C) < 2* satisfies —f; < k < €;+4+2. Since 
2* < 2(C), the proof is finished by (4.104). oO 


Exercise 4.7.4 Prove the second inequality in (4.105) in complete detail. 
The main step to prove Proposition 4.7.3 is the following: 


Proposition 4.7.5 Consider a vertex t of G and k € Z. Define C(t,k) as in 
Proposition 4.7.3. Then, if —€, <k < €; +2, we have 


E sup | > + (1o(X:) —A(C))| < LOW Nog N)*/4 | (4.106) 
CeC(t,k) jay © 


Proof of Proposition 4.7.3 To prove Proposition 4.7.3, we have to go from the 
control in expectation provided by (4.106) to the control in probability of (4.104). 
There is powerful tool to do this: concentration of measure. The function 


f(@1,..-,xXnN) = sup | > (12x) - x(0))| 


CeC(t,k) i<N 


of points x;,...,xy € [0,1]? has the property that changing the value of a 
given variable x; can change the value of f by at most one. One of the earliest 
“concentration of measure” results (for which we refer to [52]) asserts that for such 


a function, the rv. W = f(X1,..., Xn) satisfies a deviation inequality of the form 
yt 
PW —EW|>uw) < 2exp ( 2 =) (4.107) 


Using (4.106) to control EW and taking u = L2*./N(log N)>/* prove Proposi- 
tion 4.7.3 in the case k > OQ. A little bit more work is needed when k < 0. In 
that case, a curve of length 2 is entirely contained in the square V of center t and 
side 2‘+! and 1.(X;) = 0 unless X; € V. To take advantage of this, we work 


conditionally on J = {i < N; X; € V}, and we can then use (4.107) with card J 
instead of N. This provides the desired inequality when card J < L27*N. On the 
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other hand, by (4.98) and since A(V) = 2?k+2 we have P(card] > LN) < 
exp(—N27*) < Lexp(—(log N)?/?/L) because k > —€; andthe choice of 2}. O 


We start the proof of Proposition 4.7.5. We denote by 7; the class of functions 
of the type Lo, where C € C(t, k) so we can rewrite (4.106) as 


E sup Iran f Fa] < L2*/N(log NY)?’ . (4.108) 


fEFx i<N 
The key point again is the control on the size of 7; with respect to the distance of 
L(A). The difficult part of this control is the following: 
Proposition 4.7.6 We have 


y2(Fx, dz) < L2* (log N)*4 (4.109) 


Another much easier fact is the following: 


Proposition 4.7.7 We have 
V1(Fe, doo) < LIVIN . (4.110) 


Proof of (4.108) and of Proposition 4.7.5 Combine Propositions 4.7.6 and 4.7.7 
and Theorem 4.5.16. Oo 


Let us first prove the easy Proposition 4.7.7. 


k+ey4l 
9? 


Lemma 4.7.8 We have cardC(t, k) < = Ny+ey4- 


Proof A curve C € C(t,k) consists of at most gk+e1 edges of G. If we move 
through C, at each vertex of G, we have at most four choices for the next edge, so 
cardC(t,k) < qt Ni+e,41- oO 


Proof of Proposition 4.7.7 Generally speaking, a set T of cardinality < N,, and 
diameter A satisfies y)(T, d) < LA2", as is shown by taking A, = {T} forn <m 
and A,,(t) = {t}. We use this for T = Fx, so that card T = cardC(t, k) < Ne+e,+1 
by Lemma 4.7.8 and 2'+41+! < L2*./N. Oo 


We now attack the difficult part, the proof of Proposition 4.7.6. The exponent 
3/4 occurs through the following general principle, where we recall that if d is a 
distance, so is Jd: 


Lemma 4.7.9 Consider a finite metric space (T, d) with card T < Ny». Then 


y(T, Vd) < m4, 9(T, d)'? . (4.111) 
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Proof Since T is finite, there exists an admissible sequence (A,) of T such that 


WET, \IQ"A(An(),d) < vi2(T,d) « (4.112) 
n>=0 
Without loss of generality, we can assume that Am(t) = {t} for each ¢, so that 


in (4.112) the sum is overn < m — 1. Now 
A(A, Vd) = A(A, d)!/? 


so that, using Hélder’s inequality, 


DU AAR), Vd) = ~ (2” A(A,(t), dy)? 


O0<n<m-1 O0<n<m-1 
1/4 
< m9/4(Y> (2"A(An, d))”) 
n>0 


< m/4y 9(T, d)'/? , 


which concludes the proof. Oo 


Let us denote by AAB the symmetric difference (A \ B) U (B \ A) between two 
sets A and B. On the set of closed simple curves traced on G, we define the distance 


/ 
d, by d\(C, C’) = (CAC ) and the distance 


1/2 


oO oO 
8(C1, C2) = [Le — Te, lp = @(C1AC2))"? = (d(C1,Ca))'"", (4113) 


so that 
y2(Fie, do) = y2(C(t, k), 8) = y2(C(t, k), di) , 
and using Lemma 4.7.8 and (4.111) form := k + £; + 1, we obtain 
y2(Fx, dz) < L(log NY**y1 (Cz, k), di)” , 


because m < Llog N fork < €; 4+ 2. 
Therefore, it remains only to prove the following: 


Proposition 4.7.10 We have 


yi2(C(t, k), dy) < L2** . (4.114) 
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The reason why this is true is that the metric space (£, dz) of Proposition 4.1.8 
satisfies y1,2(£,d2) < oo, while (C(t, k), di) is a Lipschitz image of a subset of 
this metric space (£, dz). The elementary proof of the following may be found in 
Sect. B.2. 


Lemma 4.7.11 There exists a map W from a subset T of £L onto C(t, k) which for 
any fo, fi € T satisfies 


d\(W(fo), W(fi)) < L2**\| fo — fille - (4.115) 


To conclude the proof of Proposition 4.7.10, we check that the functionals ya g 
behave as expected under Lipschitz maps. 


Lemma 4.7.12 Consider two metric spaces (T,d) and (U,d') and a map f : 
(T, d) + (U, d') which is onto and satisfies 


vx, y ET, d'(f(x), fQ)) < Ad(x, y) 


for a certain constant A. Then 
Yo,p(U, d’) < K(a, B)AYo,p(T, d). 


Proof This is really obvious when f is one-to-one. We reduce to that case by 
considering a map g : U — T with f(g(x)) = x and replacing T by g(U). oO 


It remains to deduce Theorem 4.7.1 from Theorem 4.7.2. The argument is purely 
deterministic and unrelated to any other material in the present book. The basic idea 
is very simple, and to keep it simple, we describe it in slightly imprecise terms. 
Consider a union A of little squares of side length 2~" and the union A’ of all the 
little squares that touch A (see Fig. 4.3). 

We want to prove that A’ contains as many points Y; as A contains points X;, so 
that by Hall’s Marriage Lemma each point X; can be matched to a point Y; in the 
same little square or in a neighbor of it. Since the points Y; are evenly spread, the 
number of such points in A’ is very nearly NA(A’). There may be more than VA(A) 
points X; in A, but (4.103) tells us that the excess number of points cannot be more 
than a proportion of the length @ of the boundary of A. The marvelous fact is that we 
may also expect that A(A’) — (A) is also proportional to £, so that we may hope that 
the excess number of points X; in A should not exceed N(A(A’) — A(A)), proving 
the result. The proportionality constant is not quite right to make the argument work, 
but this difficulty is bypassed simply by applying the same argument to a slightly 
coarser grid. 

When one tries to describe precisely what is meant by the previous argument, 
one has to check a number of details. This elementary task which requires patience 
is performed in Appendix B.3. 
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Fig. 4.3. A union A of little 
squares and the boundary of 
A’ 


RX 


4.8 Lower Bound for the Leighton-Shor Theorem 


Theorem 4.8.1 If the points (X;)i<y are iid. uniform over [0, 1}* and the points 
(Yi)i<n are evenly spread, then 


(log N)3/4 


4.116 
LVN : 


E inf max d(X;j, Yrq)) > 
m i<N 
We consider the class of functions 
1 
C= {f : (0, 1] > [0,1]; f() = fd) =0; i ff (x)dx < 1} ; (4.117) 
0 


For f € C, we consider its subgraph 


S(f) = {@, y) €10,1P; y < f@}. (4.118) 


To prove (4.116), the key step will be to show that with high probability we may 
find f € C with 


card{i < N; X; € S(f)} => NA(S(f)) + ~/N(log Ny (4.119) 


4.8 Lower Bound for the Leighton-Shor Theorem 155 


With a little more work, we could actually prove that we can find such a function f 
which moreover satisfies | f’| < 1. This extra work is not needed. The key property 
of f here is that its graph has a bounded length, and this is already implied by the 
condition || f’||2 < 1, since the length of this graph is i 1+ f?(x)dx < 2. 


Lemma 4.8.2 The set of points within distance € > 0 of the graph of f has an area 
< Le. The set of points within distance € > 0 of S(f) has an area < X(S(f))+ Le. 


Proof The graph of f € C has length < 2. One can find a subset of the graph of f 
of cardinality < L/e such that each point of the graph is within distance € of this 
set.'® A point within distance € of the graph then belongs to one of L/e balls of 
radius 2€. This proves the first assertion. The second assertion follows from the fact 
that a point which is within distance € of S(f) either belongs to S(f) or is within 
distance € of the graph of f. Oo 


Proof of Theorem 4.8.1 We prove that when there exists a function f satisfy- 
ing (4.119), then inf, max;<y d(Xj, Yn) => (og N)°/4/LV/N. Let us denote by 
S(f)¢ the €-neighborhood!” of S(f) in [0, 1]?. We first observe that for any f € C, 
we have 


card{i < N; Yj € S(f)c} < NA(S(f)) # LEN + LVN. (4.120) 


This is because, by definition of an evenly spread family, each point Y; belongs 
to a small rectangle R; of area 1/N and of diameter < 10/./N and a pessimistic 
upper bound for the left-hand side of (4.120) is the number of such rectangles that 
intersect S(f)-. These rectangles are entirely contained in the set of points within 
distance L/./N of S(f)c, i.e., in the set of points within distance < € + L//N of 
S(f) and by Lemma 4.8.2, this set has area < A(S(f)) + Le + L//N, hence the 
bound (4.120). 

Consequently (and since we may assume that N is large enough), (4.119) implies 
that for € = (log N)7/4/(LVN), it holds that 


card{i< N; Y; © S(f)<} < card{fi< N; X; € S(f)}, 


and therefore any matching must pair at least one point X; € S(f) with a point 
Y; ¢ S(f)e, so that max;<y d(X;, Yx(i)) = €. oO 


Recalling the functions f;,¢ of (4.82), we consider now an integer c > 2 which 
will be determined later. The purpose of c is to give us room. Thus, by (4.87), 


1 
i fio, (4.121) 
0 


'8 This is true for any curve of length 2. If one consider a parameterization g(t) 0 < t < 2 of the 
curve by arc length, the points y(ke) for k < 2/e have this property. 
'9 That is, the set of points within distance < € of a point of S(f). 
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Let us set 
he= 
a a Vr ck £ + 
Consider the functions of the type 
f=>o fewith f= D> xnehec, (4.122) 
ks<r 1<¢<2 


where xz,¢ € {0, 1}. Then f(0) = fC) =0. 
Lemma 4.8.3 A function f of the type (4.122) satisfies 


1 
/ fordet. (4.123) 
0 


Proof Using (4.83) and (4.84), we obtain 


1 fs 2 
/ ford =O EH ifaeiB= oY Het st. og 


ksr ¢<2ck ksr ¢<2ck 


Consequently, each function of the type (4.122) belongs the class C of (4.117). 


Proof of (4.119) Given N large, we choose r as the largest integer for which 2 < 
N1/100. so that r < log N/Lc. The construction of the functions f; is inductive. 
Assume that we have already constructed f,..., fj, and let g = en Sx. For 


é < 2°4+). Jet us consider the region 


Re = S(g + fo+i,e) \ S(g) , 
so that by (4.121) 


9-2c(q+1) 


A(Re) = ie 


(4.124) 


These regions are disjoint because the functions FA: have disjoint support. 
Furthermore, if we choose fg+1 = ee<ne(g+t) Xg+1,¢fqti,e where xg+1,¢ € {0, 1}, 
then we have 


S(g + fyi) \ S(g) = (J Re. 


let 
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where 
fee" eee 1 5 
and thus 
A(S(g + fa+1) \ S(g)) = A(R) (4.125) 
J 


Since our goal is to construct functions such that there is an excess of points X; in 
their subgraph, we do the obvious thing; we take x7+1,¢ = 1 if there is an excess of 
points X; in Re, that if 


5¢ := card{i < N: X; € Re} — NA(Re) > 0, (4.126) 


and otherwise we set xx+1,¢ = 0. We have, recalling (4.125), 


cardi < N; Xj € S(g + fo+1)\ S(g)} = ) cardi < N; Xi € Re} 
J 


= Do 5e+ NA(S(g + fat1) \ S(g)). (4.127) 
J 


We will show that with high probability, we have )>,6¢ > JN/(Lr'/4). 
Recalling that g = ))j<g fe and g + fa+1 = Le<gii fe, Summation of the 
inequalities (4.127) over g < r then proves (4.119), where f is the function 


rete 


Let us say that the region Ry is favorable if 


5¢ = J NA(Re)/L* = 2-4 VSN (Lr/4) , 


where the universal constant L* will be determined later. The idea underlying this 
definition is that given a subset A of the square, with 1/N < A(A) < 1/2, the 
number of points X; which belong to A has typical fluctuations of order ./NA(A). 
Since dg > O for £ € J and since by construction £ € J when R- is favorable, we 
have 


Ey > card{£; Ry favorable} x 2~°9+). JN /(Lr!/4) , 
J 


To conclude the proof, it then suffices to show that with overwhelming probabil- 
ity at least a fixed proportion of the regions Ry for 2 < 24+" are favorable. One 
has to be cautious that the r.v.s X; are not independent of the function g and of the 
regions Ry» because in particular the construction of g uses the values of the X;. 
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One simple way around that difficulty is to proceed as follows: There are at most 


ck cq+l ace ets spigietys 
Tk ay 2 possibilities for g. To each of these possibilities corresponds a 


family of ga regions Re. If we can ensure that with overwhelming probability 


for each of these families a least a fixed proportion of the Re are favorable, we are 
done. Since there are at most 227" families, it suffices to prove that for a given 
family, this fails with probability < 2-2" To achieve this, we proceed as follows: 
by normal approximation of the tails of the binomial law, there exists a constant L* 
and a number No > 0 such that given any set A C [0, 1}? with 1/2 > (A) and 
NiX(A) => No, we have 


P(card{i < N; X; € A} — NA(A) = V/NA(A)/L*) > 1/4. (4.128) 


Since c is a universal constant and 2’ < N!/!9 (4.124) shows that NA(R~) 
becomes large with N. In particular (4.128) shows that the probability that a 
given region Ry is favorable is > 1/4. Now, using Poissonization, we can pretend 
that these probabilities are independent as ¢ varies. As noted in (4.98), given M 
independent r.v.s Z; € {0, 1} with P(Z; = 1) > 1/4, then P()°;-, Zi < M/8) < 
exp(—BM) for some universal constant 6. Since here we have M = 2°4@+)), then 
exp(—BM) = exp(—2°—72%*?). This is < 2-24 as required provided we have 
chosen c large enough that 62°? > 1. Oo 


4.9 For the Expert Only 


Having proved both the Ajtai-Komldés-Tusnddy and the Leighton-Shor matching 
theorems, we should not fall under the illusion that we understand everything about 
matchings. The most important problem left is arguably the ultimate matching 
conjecture, stated later as Problem 17.1.2. A first step in that direction would be 


to answer the following question:”° 


Question 4.9.1 Can we find a matching which achieves simultaneously both (4.36) 
and (4.101)? 


The existence of such a matching does not seem to be of any particular impor- 
tance, but the challenge is that the Ajtai-Komlés-Tusnady (AKT) theorem and the 
Leighton-Shor matching theorems are proved by rather different routes, and it is far 
from obvious to find a common proof. 

In the rest of the section, we discuss a special matching result. Consider the space 
T = {0, ih provided with the distance d(t, t’) = 2-/, where jJ=min{i > 1544 


20 The difference between a problem and a question is that a question is permitted to sound less 
central. 
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i } for t = (¢;);>1. This space somewhat resembles the unit interval, in the sense 
that N(T,d,€) < Le7! fore < 1. The space of Exercise 4.5.23 is essentially the 
space T x T. The AKT theorem tells us what happens for matchings in [0, 1]”, and 
Exercise 4.5.23 tells us what happens for matchings in 7”. But what happens in the 
space U := [0, 1] x T? It does not really matter which specific sensible distance we 
use on U; let us say that we define d((x, t), (x’, t’)) = |x — x’| + d(t, t’). 


Theorem 4.9.2 The expected cost of the optimal matching of N random i.i.d. 
uniformly distributed”! points in U with N evenly spread points is exactly of order 


VN (log N)?3/4, 


The appealing part of this special result is of course the fractional power of log. 
This result is as pretty as almost anything found in this book, but its special nature 
makes it appropriate to guide the (expert) reader to the proof through exercises. 

Let us start with a finite approximation of T. We consider the space T;, = {0, 1}’” 
provided with the distance defined for t 4 t’ by d(t, t') = 2~/, where j = min{i > 
1; 1; 4 t'} for t = (ti)i<m. We set Um = [0,1] x Tin, and we denote by 6 the 
uniform measure on U,,. Surely the reader who has reached this stage knows how 
to deduce” the upper bound of Theorem 4.9.2 from the following: 


Theorem 4.9.3 The set £ of 1-Lipschitz functions f on Um which satisfy |f\ < 1 
satisfies y2(L, dz) < Lm3/4, 


Here of course L is seen as a subset of L” (Um, Om). The proof of Theorem 4.9.3 will 
use an expansion of the elements of £ on a suitable basis. Using the same method 
as in Lemma 4.5.12, one can assume furthermore that the functions of £ are zero 
on {0} x JT and {1} x Tj. For 0 < n < m, we consider the natural partition C, 
of T, into 2” sets obtained by fixing the first n coordinates of t € T,,. Denoting by 
[dm the uniform measure on T,,, for C € Cy, we have ,(C) = 27”. A set C € Cy, 
with n < m is the union of two sets C; and C2 in C41. We denote by hc a function 
on T, which equals 2”/* on one of these sets and —2”/* on the other. Consider also 
the function hg on 7;,, constant equal to 1. In this manner, we obtain an orthogonal 
basis (hc) of L? (Tn, lm). For f € L£, we consider the coefficients of f on this 
basis, 


ap,c(f) at exp(2ipmx)hc(t) f(x, t)dxdum(t) . 


There and always, p € Z,n > 0,C € C, or p € Zand C = GY. We will lighten 
notation by writing simply >~ p.c Sums over all possible values of (p, C) as above. 


21 Tt should be obvious what is meant by “uniform probability on U”. 
>? By following the scheme of proof of (4.43). 
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Exercise 4.9.4 


(a) Prove that 


>" plapcl* <L. (4.129) 
pc 


Hint: Just use that |df/dx| < 1. 
(b) Prove that for each n and each C € C,, we have 


yee a (4.130) 
peZ 


Hint: Prove that | f hc(t) f (x, 1)dum(t)| < L277". 


We have just shown that £ is isometric to a subset of the set A of sequences 
(ap,c) which satisfy (4.129) and (4.130). 


Exercise 4.9.5 We will now show that y2(A) < Lm?>/4. 


(a) Prove that A is contained in an ellipsoid of the type 
= {(@.0); Yo a},clap.cl? <1] 
pc 


where a7 = (p* + 2"/m)/L if C € Cy,n > Oand ar 4 = = p*/L. 
(b) Conclude using (2.155). (The reader must be careful for the unfortunate clash 
of notation.) 


The goal of the next exercise is to prove the lower bound in Theorem 4.9.2. This 
lower bound is obtained by a non-trivial twist on the proof of the lower bound for 
the AKT theorem, so you must fully master that argument to have a chance. 


Exercise 4.9.6 Let us recall the functions fg.¢ of (4.82) where we take r ~ 
(log N)/100. For n > O, we still consider the natural partition C, of T into 2” 
sets obtained by fixing the first n coordinates of t ¢ T. We consider an integer p 
with 2~? ~ 1/,/r. For each q, each £ < 2%, and each C € Cy+), we consider 
the function fj,¢,c on U given by fg,¢.c(x,t) = 2?) 6 slayilel). We consider 
functions of the type fg = yo e<29,CeCotp 2q,0,C fq,t.c Where Zg,¢,c € {0,1, —l}. 
Copy the proof of the lower bound of the AKT theorem to prove that with high 
probability, one can construct these functions such that }°, 2g J is 1-Lipschitz, and 
for each q, 0; <n (fg(Xi) — f fqd0) = VN/(Lr!/4), where X; are iid. uniform 
on U and @ is the uniform probability measure on U. Summation over g < r yields 
the desired result. 
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While making futile attempts in the direction of Theorem 4.9.2 arose further 
questions which we cannot answer. We describe one of these now. We recall the 
functionals yq,g of (4.5) and the uniform measure fj, on Tin. 


Question 4.9.7 Is it true that for any metric space (7, d) the space U of 1-Lipschitz 
maps f from Tm to T, provided with the distance D given by D(f, f’)*? = 
Fy A£(S)s f'(8))?dm (8) satisfies y2(U, D) < Lm3/4y1 2(T)? 


The motivation for this result is that if T is the set of 1-Lipschitz functions on [0, 1], 
then yj 2(T) < L (using Fourier transform to compare with an ellipsoids), and with 
minimal effort, this would provide an alternate and more conceptual proof for the 
upper bound of Theorem 4.9.2. 


Exercise 4.9.8 In the setting of Question 4.9.7, assume that e,(T, d) < 2~”. Prove 
that e2,(U, D) < L2~” (and better ifn > m). Conclude that )>,,..) 2”/en(U, D) < 
Lm. Prove that y2(U, D) < Lmy2(T). 


Key Ideas to Remember 


¢ Ellipsoids in a Hilbert space are in a sense smaller than their entropy numbers 
indicate. This is true more generally for sufficiently convex sets in a Banach 
space. This phenomenon explains the fractional powers of logarithms occurring 
in the most famous matching theorems. 

e The size of ellipsoids is sometimes accurately described by using proper 
generalizations yy, (7, d) of the basic functional y2(T, d). 

¢ Matching theorems are typically proved through a discrepancy bound, which 
evaluates the supremum of the empirical process over a class F of functions. 

¢ Bernstein’s inequality is a convenient tool to prove discrepancy bounds. It 
involves the control of F both for the L? and the supremum distance. 

¢ Using two different distances reveals the power of approaching chaining through 
sequences of partitions. 


4.10 Notes and Comments 


The original proof of the Leighton-Shor theorem amounts basically to perform by 
hand a kind of generic chaining in this highly non-trivial case, an incredible tour de 
force.?> A first attempt was made in [92] to relate (an important consequence of) 
the Leighton-Shor theorem to general methods for bounding stochastic processes 
but runs into technical complications. Coffman and Shor [26] then introduced the 
use of Fourier transforms and brought to light the role of ellipsoids, after which 
it became clear that the structure of these ellipsoids plays a central part in these 
matching results, a point of view systematically expounded in [114]. 


23 There is a simple explanation as to why this was possible: as you can check through Wikipedia, 
both authors are geniuses. 
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Chapter 17 is a continuation of the present chapter. The more difficult material it 
contains is presented later for fear of scaring readers at this early stage. A notable 
feature of the result presented there is that ellipsoids do not suffice, a considerable 
source of complication. The material of Appendix A is closely related to the 
Leighton-Shor theorem. 

The original results of [3] are proved using an interesting technique called 
the transportation method. A version of this method, which avoids many of the 
technical difficulties of the original approach, is presented in [134]. With the 
notation of Theorem 4.5.1, it is proved in [134] (a stronger version of the fact) that 
with probability > 9/10, one has 


oi Nd(Xi, Yq)” 
f—)> (——_ <2. 4.131 
" Noe? LlogN }= ey 


Since expx > x, (4.131) implies that Vien d(X;, Gay < LlogWN and hence 
using the Cauchy-Schwarz inequality }°;—y d(Xi, Yxvi)) < L./Nlog N. More- 
over, (4.131) also implies max;<y d(Xj, Yx(i)) < Llog N/VN. This unfortunately 
fails to bring a positive answer to Question 4.9.1. 

For results about matching for unbounded distributions, see the work of J. Yukich 
[146] as well as the nonstandard results of [133]. 


Part II 
Some Dreams Come True 


Chapter 5 ® 
Warming Up with p-Stable Processes od 


Later, in Chap.11, we will prove far-reaching generalizations of the results of 
Chap. 2 to many processes which are not close to Gaussian processes. Getting there 
will require many new ideas, and in this chapter, we will present some of them in a 
setting which remains close to that of Gaussian processes. 


5.1 p-Stable Processes as Conditionally Gaussian Processes 


Consider a number 0 < p < 2. Ary. X is called (real, symmetric) p-stable if for 
each A € R, we have 


(5.1) 


oy 


EexpiaX = exp(— 5 


where o = op(X) = 0 is called the parameter of X. The name “p-stable” comes 
from the fact that if X,,..., Xm are independent and p-stable, then for real 


numbers a, the r.v. ae a; Xj is also p-stable and 


1/p 
op( >. 4jXj) = (x aj lPop(X))") , (5.2) 


j<m j<m 


This is obvious from (5.1). 

The reason for the restriction p < 2 is that for p > 2, no rv. satisfies (5.1). The 
case p = 2 is the Gaussian case, which we now understand very well, so from now 
on we assume p < 2. Despite the formal similarity, this is very different from the 
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Gaussian case. It can be shown that 
lim_s?P(|X| > 8) = cpop(X)? (5.3) 
SCO 


where cy > 0 depends on p only. Thus X does not have moments of order p, but 
it has moments of order g for g < p. We refer the reader to [53] for a proof of this 
and for general background on p-stable processes. 

A process (X;)rer is called p-stable if, for every family (@;);e7 for which only 
finitely many of the numbers a; are not 0, the r.v. > , %rX; is p-stable. We can then 
define a (quasi-)distance d on T by 


d(s,t) = op(Xs — Xt). (5.4) 


When p > 1, a p-stable rv. is integrable, and E|X| is proportional to o,(X). Thus 
one can also define an equivalent distance by d/(s, t) = E|Xs; — X;|. 

A typical example of a p-stable process is given by X; = )0;_,, Yj where t = 
(tj)i<n and (Y;)i<n are independent p-stable r.v.s. It can in fact be shown that this 
example is generic in the sense that “each p-stable process (with a finite index set) 
can be arbitrarily well approximated by a process of this type”. Assuming further 
that o)(¥;) = 1 for each i, (5.2) implies that the distance induced by the process is 
then the €? distance, d(X;, X;) = ||s — tl|p. 

At the heart of this chapter is the fact that a p-stable process (X;) can be 
represented as a conditionally Gaussian process. That is, we can find two probability 
spaces (@ , P) and (2’, P’) anda family (Y;);er of r.v.s on 2 x Q' (provided with 
the product probability), such that 


Given any finite subset U of T, the joint 


laws of (Y;);eu and (X;);ey are identical (5.5) 


Given w € @, the process w’ +> Y;(a, a’) 


is a centered Gaussian process. (5.6) 


This result holds for any value of p with 1 < p < 2. A proof is given in Sect. C.3. 
Our strategy is to study the process (Y;) as in (5.6) at given w.' A remarkable fact 
is that we do not need to know precisely how the previous representation of the 
process (X,) arises. More generally, if you are disturbed by the fact that you have no 
intuition about p-stable processes, do not be discouraged. Our result will not need 
any understanding of p-stable processes beyond what we have already explained. 


' Tf you have already glanced through the rest of the book, you should be aware that a basic reason 
the special case of p-stable processes is simple is that these processes are conditionally Gaussian. 
Many more processes of interest (such as infinitely divisible processes) are not conditionally 
Gaussian, but are conditionally Bernoulli. 
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5.2 A Lower Bound for p-Stable Processes 


The main goal of this chapter is to prove the following: 


Theorem 5.2.1 For 1 < p < 2, there is anumber K (p) such that for any p-stable 
process (X;)teT, we have 


¥q(T,d) < K(p)Esup X;, (5.7) 
teT 


where q is the conjugate exponent of p, i.e., 1/q + 1/p = 1, and where d is as 
in (5.4). 


Certainly this result reminds us of the inequality y2(T,d) < LEsup,¢7 X; of the 
majorizing measure theorem (Theorem 2.10.1). A striking difference is that the 
tails of p-stable r.v.s are very large (see (5.3)) and are not relevant to (5.7). The 
bound (5.7) cannot be reversed. 


Exercise 5.2.2 


(a) Consider i.i.d. p-stable r.v.s (Y;)j<ny with op(¥;) = 1. Fort € RY, set X; = 
Dien tiYi. Prove that the distance (5.4) is given by d(s,t) = (Vj<y |si — 


t |Py!/ P. 

(b) Let 7 = {(+1,..., +1)}. Prove that in this case the two sides of (5.7) are of the 
same order. 

(c) Let now T consist of the N sequences (0,...,0, 1,0,..., 0). Prove that the two 


sides of (5.7) are not of the same order. 


This exercise leaves little hope to compute E sup,<7 X; as a function of the geometry 
of (T, d) only. 

The bound (5.7) cannot be reversed, but it would be a lethal mistake to think 
that it is “weak”. It provides in fact an exact information on some aspects of the 
process (X;);e7, but these aspects are not apparent at first sight, and we will fully 
understand them only in Chap. 11. 

Theorem 5.2.1 has a suitable version for p = 1, which we will state at the end of 
this chapter. 


5.3 Philosophy 


Let us consider the process (Y;) as in (5.5) and (5.6). We denote by E’ integration in 
P’ only. Given w, we consider the random distance d,, on T given by 


1/2 


di(s,t) = (E'(¥s(@, w') — ¥:(w, @’))”) (5.8) 
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It is the canonical distance associated with the Gaussian process (Y;(@, -)). 
Consider the r.v. Z = sup,e7 Y;. Then Theorem 2.10.1 implies 


y2(T, do) < LE'Z, 
and taking expectation gives 


Ey.(T, dy) < LEE’Z = LEsup X; . (5.9) 
teT 


Now, from the information (5.9), how do we gain a control of the size of the metric 
space (T', d)? There is a very important principle at work here, which will play a 
major part in the book. Suppose that we have a set T and that on T we have a 
distance d and a random distance d,). 


Principle A If, givens, t € T, it is very rare that the distance d,,(s, t) is very much 
smaller than d(s, t), then some measure of size of (7, d) is controlled from above 
by the typical value of y2(T, dq). 


We do not expect the reader to fully understand this principle now, but it will 
become clearer as we repeatedly apply it. In the present case, the property that 
it is very rare that d,,(s, ft) is very much smaller than d(t, s) is expressed by the 
following: 


Lemma 5.3.1 Define a by 


i 4 4 
ek, (5.10) 
a p 2 
Then for alls,t € T andeé > 0, we have 
b 
P(d.(s,t) < ed(s,t)) < exp ( = 4) (5.11) 


where d is the distance (5.4) and where b, > 0 depends on p only. 


Thus, given a pair (s, ¢), it is rare that d,)(s, t) is much smaller than d(s, t). Given 
two pairs (s, t) and (s’, t’), we however know nothing about the joint distribution of 
the r.v.s d(s, t) and d,,(s’, t’). Still the information contained in this lemma suffices 
to deduce Theorem 5.2.1 from the majorizing measure theorem (Theorem 2.10.1). 
This is an occurrence of Principle A. 


Proof Since the process Y;(@, -) is Gaussian, we have 


42 
E’ expid(Y; — Y;) = exp ( - das, 1)) : 
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Taking expectation, using (5.1), and since the pair (Y;, Y;) has the same law as the 
pair (Xs, X;), we get 


exp ( = Aras, 1) = Eexp ( = Kas 1)) , (5.12) 


By Markov’s inequality, for any r.v. Z and any u, we have 


P(z <wexp(- 4") < Eexp(-~2z). 


Using this for Z = d2(s, t) and u = €7d?(s, t), we get, using (5.12), 
1 
P(d,.(s, t) < €d(s, t)) < exp (57a, t) — |Al|?d?(s, 1))) 


The conclusion follows by optimization over i. Oo 


5.4 Simplification Through Abstraction 


Now that we have extracted the relevant features, Lemma 5.3.1 and Principle A, we 
can prove an abstract result. 


Theorem 5.4.1 Consider a finite metric space (T, d) and a random distance d, on 
T. Assume that for some b > 0, we have 


b 
Vs,t ET, Ve > 0, P(dy(s,t) < ed(s,t)) < exp (- =) (5.13) 
€ 


where a > 2. Then 
v¥q(T, d) < KEy2(T, dw) , (5.14) 


where 


and where K depends on a and b only. 


Proof of Theorem 5.2.1 When T is Finite We apply Theorem 5.4.1 with the value 
a given by (5.10), so that the value of g of Theorem 5.4.1 satisfies 1/q = 1 — I/p. 
It follows from (5.14) that y,(T, d) < LEy2(T, d,), and combining with (5.9), this 
implies (5.7). oO 
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Of course T need not be finite and some work would be needed to handle this 
case. This work is predictable and tedious, and now is not the time to be distracted 
by routine considerations.” We turn to the proof of Theorem 5.4.1. The main step of 
the proof is the following lemma: 


Lemma 5.4.2 Consider a probability measure 4 on T. Then for each t € T, with 
probability > 15/16, we have 


fe (log a ear < «fo go de RAE 
0 (Balt, €)) ~ Jo (Ba, (t, €)) 


(5.15) 
Proof We define «9 = A(T, d), and for n > 1, we define €, by 
€n = inffe > 0; uw(Bal(t,€)) = 1/Nn} 
and we prove first that 
i. (log ier < K-24, + KA(T, a). (5.16) 
0 u(Ba(t, €)) 


n>1 


For this, let us set f(€) = dog /u(Ba(t, €))))!/4 and observe that f(€) = 0 for 
€ > €) = A(T, d). Since f (€) < K2”"/4 for € > €n, we have 


/ fGie= ey " f@de < K S126, = KYL 4e, + KAT, d) . 
0 € 


n>O° ntl n>0 n>1 


The heart of the argument starts now. By (5.13), it holds that for any s 
1 
d(s,t) > én > P(do(s,1) s G2I-"en) < exp(—2"*¥) , 
so that by Fubini’s theorem 


Eu({s; d(s,t) = €n,do(s, tf) <2-7/"en/K}) < exp(—2"*”) 


and by Markov’s inequality, 


P(u(ls s d(s, 1) = én do(s,1) $2-"/*en/K}) = 1/Nn) = Nn exp(—2"?) 


? It is explained in the proof of Theorem 11.7.1 how to cover the case where T is countable. 
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As )- 50 Nn exp(—2”t?) < )°.9 exp(—3 - 2”) < 1/16, with probability > 15/16 
for each n > 1, one has = 


w({s 3 d(s,t) > €ny da (st) < 27 en/K}) < 1/Nn 


Since w({s ; d(s,t) < &)}) < 1/Nn, it follows that 


pe Ba G2"! 6, / KY = 2) Nn 


Consequently for € < ny 
2"/2//K so that 


n/@e,/K, we have (log(1/u(Ba,(t, €))))!/?_ > 


[oe Nn 
i (108 Eno mma) *2 Lf (es aa) 


Tn * (Ba, (t, €)) 


n>1 


1 1 

n/2 = n/2 __ 5(n—1)/2 _ n/q 

2 K ) 2° (Mn — Nn41) = K 0 2 nn = kK 2 Qe Gs 
n= n= 


where we use in the last equality that 1/2 — 1/a = 1/q. Combining with (5.16) the 
proof of (5.15) is finished. 


Oo 
Proof of Theorem 5.4.1 It follows from (2.7) and (5.15) that 


ie (log de Z Ke [~ ge A a) 
0 (Batt, €)) ~ 0 (Ba, (t, ©) 


(5.17) 


so that, integrating with respect to jz and using linearity of expectation 


[au [° (00 


q 
°8 Bat. ma) peau 


© 1 
KE d log ———————_d 
. [ i of \ °° Bao) 


It follows from (3.32) that the last term is < K Ey2(T, d,). This does not depend on 
LL So that we have proved that 


eo 1 
sup | anit) f (10 
KM JST 0 


1/q 
TTS de < KEy2(T, dy) + KA(T, d). 
Bat ey) ANE eae 


(5.18) 
Next, we show that the last term is of smaller order. Given s,t € T, 
from (5.13) with probability > 1/2, one has d,,(s, t) > d(s, t)/K, and consequently 
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A(T, dy) > A(T, d)/K so that 
A(T, d) < KA(T, dy) < Ky2(T, do) 


and taking expectation, we obtain A(T, d) < KE)2(T, dj). Combining with (5.18), 
we finally obtain 


sup | du(t) a (log Ge < KE;2(T, dy). (5.19) 
uw JIT 0 (Balt, €)) ~ 


Now, just as in the case g = 2 of (3.31), we can use Fernique’s convexity argument 
to prove that 


yq(T, d) < K sup | auc [ (log are 
eT 0 L(Ba(t, €)) 


and combining with (5.19), this concludes the proof. oO 


5.5 1-Stable Processes 


In this section, we state an extension of Theorem 5.4.1 to the case a = 2, and we 
explore the consequences of this result on 1-stable processes. According to (5.3), 
1-stable r.v.s do not have expectation. This is the main technical difficulty: we 
can no longer use expectation to measure the size of 1-stable processes. It seems 
counterproductive to spend space and energy at this stage on such a specialized 
topic, so we state our results without proofs. Some proofs can be found in [132].° 
We set Mp = 1, M, = 2%" forn > 1. Given a metric space (T, d), we define 


Yoo (T, d) = a sup y 2” A(Bn(t)), (5.20) 


teT n>0 


where the infimum is taken over all increasing families of partitions (B,) of T with 
card 6B, < My. This new quantity is a kind of limit of the quantities y,(T, d) as 
a> o. 


Exercise 5.5.1 Consider the quantity y*(T, d) defined as 


y*(T,d) = inf sup )* A(A,(t)) , (5.21) 


teT 430 


3 We know now how to give much simpler proofs than those of [132]. 
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where the infimum is computed over all admissible sequences of partitions (A,). 
Prove that 


ay", d) < Yo(T,d) < Ly*(T,d). (5.22) 


Hint: Given an increasing sequence of partitions (B,) with card B, < M,, consider 
the increasing sequence of partitions (A,,) given by Am = Bn for 2” <m < 2"!, 


Theorem 5.5.2. Consider a finite metric space (T, d) and a random distance dy on 
T. Assume that 


1 
VWs,t ET, Ve > 0, P(dy(s,t) < €d(s,t)) < exp (- =) 
€ 
Then 


1 3 
P(12(T, do) = FYo0(T, d)) = 7 


Applying this result to 1-stable processes, we obtain the following: 


Theorem 5.5.3 For every 1-stable process (X1)ter and to € T, we have 


P( sup(X, ais ye d)) gn 
teT sie L ~ OL 


This result looks weak, but it is hard to improve: when T consists of two points 
fo and ¢), then sup,e7(X1 — Xi) = max(X;, — Xi, 0) is O when X;, — Xp < 0, 
which happens with probability 1/2. 


Key Ideas to Remember 


¢ We have met the powerful Principle A which lets us deduce some “smallness” 
information about a metric space from the existence of a random distance d, 
such that we control Ey2(T, d,) from above. 

e We have seen a typical application of this principle to gain information about 
p-stable processes, in a way which will be vastly generalized later. 


5.6 Where Do We Stand? 


We have found an angle of attack on processes which are conditionally Gaussian. 
Unfortunately, such processes are uncommon. On the other hand, many processes 
are conditionally Bernoulli processes (with a meaning to be explained in the next 
chapter). The same line of attack will work on these processes, but this will require 
considerable work, as the study of Bernoulli processes is much more difficult than 
that of Gaussian processes. 


Chapter 6 ®) 
Bernoulli Processes Cheek for 


6.1 Bernoulli rv.s 


Throughout the book, we denote by ¢; independent Bernoulli (=coin flipping) r.v.s; 
that is, 


1 
P(e; = +1) = 5° 
(Thus ¢; is ar.v., while €; is a small positive number.) 

Consider independent symmetric r.v.s &;. It is fundamental that if (¢;) denotes an 
independent sequence of Bernoulli r.v.s, which is independent of the sequence (&;), 
then the sequences (&;) and (¢;&;) have the same distribution. This is obvious since 
this is already the case conditionally on the sequence (¢;), by the very definition 
of the fact that the sequence (&;) is symmetric. We will spend much time studying 
random sums (and series) of functions of the type X(u) = )7; & xi(u) where x; 
are functions on an index set U. This sum has the same distribution as the sum 
; €i& xi(u). Given the randomness of the &;, this is a random sum of the type 
>>; &i fi(u) where fj(u) are functions. Then, all that matters is the set of coefficients 
T = {t = (f,@))i; u € U}. This motivates the forthcoming definition of Bernoulli 
processes. 

Given a sequence (t;)i>1 € £? = ¢?(N*), we define 


Lee ya (6.1) 


i=l 


Consider a subset T of £*. The Bernoulli process defined by T is the family (X;);er. 
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A simple and essential fact is that the rv. X; “has better tails than the 
corresponding Gaussian r.v.” as is expressed by the following, for which we refer to 
(53, page 90] or to Exercise 6.1.2: 


Lemma 6.1.1 (The Sub-Gaussian Inequality) Consider independent Bernoulli 
rv.s €; and real numbers t;. Then for each u > 0, we have 


P(| Se eiti 


2 


> u) <2exp ( es (6.2) 


Exercise 6.1.2 


(a) Use Taylor series to prove that for A € R 


2 
E exp Ag; = cosha < exp a 


(b) Prove that 


E exp (2) eit) < exp (#2) ° 


l L 


and prove (6.2) using the formula P(X > u) < exp(—Au)EexpAX foru > 0 
and A > 0. 


Corollary 6.1.3 (Khintchin’s Inequality) Consider complex numbers t;, indepen- 
dent Bernoulli r.v.s ¢;, and p > 1. Then 


1/2 


(El oan’)? <tv—( SP) (6.3) 


Proof We reduce to the case of real numbers, and we combine the sub-Gaussian 
inequality with (2.24). oO 


In the case where tf; = 1 fori < N and ¢; = O otherwise, (6.2) gives a strong 
quantitative form to the well-known statement that )°;—y €; is typically of order 
/N. The reason why the sum of NV numbers of absolute value 1 can be of order JN 
is because there is cancellation between the terms. It is completely wrong when no 
such cancellation occurs, e.g., in the exceptional case where all the ¢; equal 1. In 
contrast, a bound such as |}; e;t;| < >; |ti|, which we will massively use below, 
does not rely on cancellation, since it holds even if all terms are of the same sign. 


More generally, there is typically cancellation in a sum )°; jt; when ,/ >>; |ti|? « 


dy; |t;|. 
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6.2 Boundedness of Bernoulli Processes 


Considering a subset T of £* = ¢7(N*) and the corresponding Bernoulli process 
(Xr )reT, we set 


b(T) := Esup X; = Esup > tei : (6.4) 


teT teT i> 


We observe that b(T) > 0, that b(T) < b(T’) if T C T’, and that b(T +t) = D(T), 
where T + fo = {t +40; t € T}. 

We would like to understand the value of b(T) from the geometry of T, as we 
did in the case of Gaussian processes. Lemma 6.1.1 states that the process (X;)+er 
satisfies the increment condition (2.4) so that Theorem 2.7.11 implies 


DT) < Ly2(T) , (6.5) 


where we remind the reader that we often write y2(T) instead of yo(T, d) when d 
is the £2 distance.! Let us now write 


g(T) =Esup )- tigi. 


teT i>1 


Since y2(T) < Lg(T) by Theorem 2.10.1, Bernoulli processes “are smaller than 
the corresponding Gaussian processes”. There is a much simpler direct proof of this 
fact. 


Exercise 6.2.1 (Review of Jensen’s Inequality) In this exercise, we review 
Jensen’s inequality, a basic tool of probability theory. It states that if X is a rv. 
valued in vector space W and @ a convex function on W, then (EX) < E®(X). 
When using this inequality, we will use the sentence “we lower the value by taking 
expectation inside @ rather than outside”. 


(a) If you know the Hahn-Banach theorem, convince yourself (or learn in a book) 
that d(x) = sup feC f(x) where C is the class of affine functions (the sum of 
a constant and a linear function) which are < ®. Thus, for f € C, we have 
E f(X) = f(E(X)). Deduce Jensen’s inequality from this fact. 

(b) Forar.v. X and p > 1, prove that |x + EX|? < Eljx + X|?. 

(c) If X, Y and@ > Oare independentr.v.s, prove that E@|X +EY|? < Eo|X+Y/P. 


' Since (6.5) makes a massive use of the sub-Gaussian inequality (6.2) to control the increments 
along the chaining, it will be natural to say that this bound relies on cancellation, in sharp contrast 
with the bound (6.8) below. 
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Proposition 6.2.2 It holds that 


b(T) < [Een (6.6) 


Proof If (¢;)j>1 is an i.i.d. Bernoulli sequence that is independent of the sequence 
(gi)i>1, then the sequence (¢;|g;|)i>1 is i.i.d. standard Gaussian. Thus 


g(T) = E sup) eilgilti : 


teT i>] 


Denoting by E, expectation in the rv.s. g; only (given the rv.s ¢;) and since 


E,|gi| = /2/m, we get 


2 2 
g(T) = EE, sup) tilgilei = \/ —Esup ) tie =f =b(7) 
HT teT i>1 ss 


teT i>] 
where in the second inequality we apply Jensen’s inequality to take the expectation 
E, inside the supremum rather than outside. oO 
It is worth making a detour to state a general result in the direction of (6.6). 


Lemma 6.2.3. Consider vectors x; in a complex Banach space and independent 
symmetric real-valued rv.s &;. Then, if €; denote independent Bernoulli r.v.s, we 
have 


E| >> éxi] = El Seeder - (6.7) 


Proof Assuming without loss of generality that the r.v.s &; and ¢; are independent, 
we use the symmetry of the r.v.s &; to write 


E| Do bixi | =E| do eiléila| : 


Now El] >; eiléilaill > Ell 0,(EléDeixi|| as a consequence of Jensen’s 


inequality. oO 


Thus to find a lower bound to E|| }°; & x; |], we can reduce to the case where ; is 
of the type ajé;.7 

We go back to Bernoulli processes. We can bound a Bernoulli process by 
comparing it with a Gaussian process or equivalently by using (6.6). There is 


? But this method is not always sharp. 
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however a completely different method to bound Bernoulli processes. Denoting by 
Itll1 = ie , |G the €! norm of t, the following proposition is trivial: 


Proposition 6.2.4 We have 


b(T) < sup |It|h1 - (6.8) 
teT 


We have found two very different ways to bound b(T), namely, (6.6) and (6.8). 


Exercise 6.2.5 Convince yourself that these two ways are really different from each 
other by considering the following two cases: T = {u,0} where u ¢ ¢! and T the 
unit ball of é!. 


We recall the Minkowski sum T; + 7) = {t! +127; theTN%,t? € T2}. The 
following definition and proposition formalize the idea that we can also bound b(T) 
through mixtures of the previous situations. 


Definition 6.2.6 For a subset T of €2, we set? 


b*(T) := inf {y2(T1) + sup Wilh; TCM + Ts (6.9) 


tEeT> 
Since Xj14,2 = X,1 + X,2, we have 


sup X; = sup X;+ sup X; . 
teT|+Th teT teT> 


Taking expectation yields b(T) < b(7|+7)) = b(7\)+b(72). Combining with (6.5) 
and (6.8), we have proved the following: 


Proposition 6.2.7 We have 
b(T) < LB*(T). (6.10) 


It is natural to conjecture that the previous bound on b(T) is sharp, that is, that 
there exist no other way to bound Bernoulli processes than the previous two methods 
and the mixtures of them. This was known as the Bernoulli conjecture. It took nearly 
25 years to prove it.4 


3 You may find the notation silly, since 7; is controlled by the 7 norm and 7» by the £! norm. 
The idea underlying my notation, here and in similar situations, is that T; denotes what I see as the 
main part of T, whereas 7> is more like a perturbation term. 


4 Please see Footnote 2 on page 326 concerning the name I give to this result. 


180 6 Bernoulli Processes 


Theorem 6.2.8 (The Latala-Bednorz Theorem) There exists a universal constant 
L such that given any subset T of €7, we have 


b*(T) < LB(T). (6.11) 


The proof of Theorem 6.2.8 will consist in describing a procedure to decompose 
each pointt € T asasumt = t! + t? where ||t7||] < Lb(T) and 7; = {t!;t € T} 
satisfies y2(T;) < Lb(T). This procedure makes T naturally appear as a subset 
of a sum 7; + 7», even though T may be very different itself from such a sum. 
The intrinsic difficulty is that this decomposition is neither unique nor canonical. 
To illustrate the difficulty, consider a set 7; with y2(7;) < 1, so that b(7)) < L. 
To each point ¢ of 7), let us associate a point g(t) with ||g(t)||1 < 1, and let T = 
{t + g(t); t € T,}. Thus b(T) < L. Now, we are only given the set T. How do we 
reconstruct the set T;? 

The proof of the Latata-Bednorz result involves a number of deep ideas. These 
are better presented gradually on simpler situations (following the path by which 
they were discovered), and the proof of the theorem is delayed to Chap. 10. In the 
rest of the present chapter, we build our understanding of Bernoulli processes. In 
the next section, we present three fundamental results for Bernoulli process, two of 
which have close relationships with properties of Gaussian processes. We then start 
developing the leading idea: to control Bernoulli processes, one has to control the 
index set T with respect to the supremum norm. 


6.3 Concentration of Measure 


The following “concentration of measure” result should be compared with 
Lemma 2.10.6: 


Theorem 6.3.1 Consider a subset T C ¢, and assume that for a certaino > 0, 
we have T C B(O,a). Consider numbers (a(t));er, and let M be a median of the 
rv. SUP;er (D_; &iti + a(t)). Then 


2 


Vu>0, P( sup ()_ eit +a(t)) — M| > u) < 4exp (- —s) 6.12) 
teT i> 80 
In particular, 
JEsup (> eit +a(t)) - M| 2g (6.13) 
teT 


i>1 
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and 


Vu > 0, P( 


sup (> - eit; +a(t)) — Esup (> eit; +a(t))| > w) < Lexp(- ==). 


teT i>l teT i> 


The proof relies on the fact that the function g(x) = supjep QO j<y Xiti + a(t) 
on ¢? is convex and has a Lipschitz constant < o. Such a function always satisfies 
a deviation inequality P(|g((e;)ix1) — M| => u) < 4exp(—u?/(807)) when M 
is a median of g((€;)j>1). This fact has a short and almost magic proof (see, for 
example, [53] (1.9)). We do not reproduce this proof for a good reason: the reader 
must face the fact that if she intends to become fluent in the area of probability 
theory we consider here, she must learn more about concentration of measure and 
that this is better done by looking, for example, at [121] and [52] rather than just at 
the proof of Theorem 6.3.1. 

We end this section with a few simple important facts. We first recall the Paley- 
Zygmund inequality (sometimes called also the second moment method): for a r.v. 
X > 0, with EX? > 0, 


1 1 (Ex)? 
P(x > 5Ex) > (6.15) 


—4 Ex2 


Exercise 6.3.2 Prove (6.15). Hint: Let A = {X > EX/2}. Show that EX1ac < 
EX /2. Show that EX/2 < E(X1,4) < (EX?P(A))!/”. 


Corollary 6.3.3 If ¢; are independent Bernoulli r.v.s and t; are numbers, it holds 
that 


P(| Sve] > (So nP)!) > =. (6.16) 


i>1 i>l 
Proof By the sub-Gaussian inequality and (2.24), the rv. X = | vist ejti | 
satisfies EX? < List \t;|")> = L(EX)*. We then apply the Paley-Zygmund 
inequality (6.15). Oo 


Exercise 6.3.4 As a consequence of (6.16), prove that if the series }°,. | eit 
converges a.s., then )-;.., 1? < 00. 


As a consequence of (6.16), we have 


1 
E| > o eiti| = ziltlle - (6.17) 


i=l 


Exercise 6.3.5 Prove that forar.v. Y > 0 with EY > 0, one has EYEY? > (EY’)?, 
and find another proof of (6.17). 
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Lemma 6.3.6 For a subset T of ¢*, we have 
A(T, dz) < Lb(T). (6.18) 


Proof Assuming without loss of generality that 0 € T, we have 


= I 
— L wh ? 


Vt eT, b(T) > Emax (0, > «iti) = se| So citi 
i>1 2 i>1 


using that max(x, 0) = (|x| + x)/2 in the equality and (6.17) in the last inequality. 
This proves (6.18). oO 


6.4 Sudakov Minoration 


In this section, we prove a version of Lemma 2.10.2 (Sudakov minoration) for 
Bernoulli processes. This will be our first contact with the essential idea that when 
all the coefficients ¢; are small (say, compared to ||f||2), the rv. eS 1 4'€; resembles 
a Gaussian r.v., by the central limit theorem. Therefore one expects that when, in 
some sense, the set T is small for the 2° norm, g(T) (or, equivalently, y2(T)) is not 
too much larger than b(T). This will also be the main idea of Sect. 6.6. 


Theorem 6.4.1 Consider t),..., tm in €2, and assume that 
LAL = |lte—tella>a. (6.19) 
Assume moreover that 
Ve<m, |Itello <b. (6.20) 


Then 


1 2 
E sup > eite,i > 7 min (a/logm, —) ; (6.21) 


l<m i>] 


For a first understanding of this theorem, one should consider the case where #; is the 
i-th element of the basis of £7. One should also compare (6.21) with Lemma 2.10.2, 
which in the present language asserts that 


E sup) gites > —viogm , (6.22) 


e<m i>] ~ Dy 
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and will be the basis of the proof of (6.21). To understand the need of the minimum 
in (6.21), you should solve the next exercise. 


Exercise 6.4.2 Convince yourself that in (6.21) the term a?/b is of the correct 
order. Hint: Remember that ; eitei < ; |te,;|. Look for examples where tg; € 
{0, D}. 


Corollary 6.4.3 Fora set T C €* such that 
WET |Itllo <b 


and for any a > 0, we have 


2 
b(T) > = min (a/Nr, db, a), —) (6.23) 


b 


Proof By Lemma 2.9.3(a) form = N(T,d2,a), we can find points (tg)e<m, as 
in (6.20). oO 


We start the preparations for the proof of (6.21). 


Lemma 6.4.4 (The Contraction Principle) Consider independent and symmetric 
rv.s ni valued in a Banach space and numbers a; with \a;| < 1. Then 


E| Yami = Ell Do al - (6.24) 


i>1 i=l 


Proof We consider the quantity E|| 7; , ani || as a function of the numbers a;. It 
is convex, and its domain is a convex compact set. Therefore it attains its maximum 
at an extreme point of its domain. For such an extreme point, aj = +1 for each ij, 
and in that case, the left- and right-hand sides of (6.24) coincide. oO 


We will also need the following variation of Bernstein’s inequality: 


Lemma 6.4.5 Consider centered independent r.v.s W; and numbers a; such that 
Eexp(|W;i|/a;) < 2. Then for v > 0, we have 


2 


P(>o Wi > v) < 2exp(— = min (——. ——_)) : (6.25) 


| Deia1G SUPi>1 Gi 


Proof We write 


ele 2 
jexpx—1—x| <)> <lx/ expla, 
k>2 
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so that, using the Cauchy—Schwarz inequality in the second inequality, 


EexpaW; < 1+ 27EW? exp |AW;| < 1+4°(EW?)!/?(Eexp2|aw;|)!/7 . 


Now for |Ala; < 1/2, we have Eexp2|AW;| < 2. Since Eexp(|W;|/aj;) < 2, we 
also have Ew; < La}. Thus, for |Ala; < 1/2, we have 


Eexpaw; <1+ Lira? < exp Lara: ; 


from which the conclusion follows as in the proof of (4.44). oO 


A technical ingredient of the proof of Theorem 6.4.1 is the following consequence 
of Lemma 6.4.5: 


Corollary 6.4.6 Consider independent standard Gaussian r.v.s (gj). Given a num- 
ber A > 0, we may find a number c large enough such that the rv.s & = gilg;|>c} 
satisfy the following property. Consider an integer N and numbers a,b > 0 such 
that 


Vlog N < 


For £ < N, consider tg = (te,;)i>1 with ||te|l2 


(6.26) 


SES 


< 2a and ||te|loo < b. Then 


E sup 2, &ite; < < log N (6.27) 


€<N | 


If, instead of & we had g; in the left-hand side, we would obtain a bound La./log N 
(see (2.15)). The content of the lemma is that we can improve on that bound by a 
large constant factor by taking c large. 


Proof Given a number B > 0, we have Eexp B|g;| < 00, and it is obvious that 
for c large enough we have E exp B|é;| < 2. Given £, we use a Bernstein-like 
inequality (6.25) with W; = &te; anda; = |te;|/B, so that Det a? < 4a? /B* and 
sup;s, laj| < b/B, and we obtain 


P( > ite, > v) < Lexp (— min =), 


i=1 


so that, using (6.26) in the first inequality, for vB > a./log N, we have 


1 2B? vB./logN 
(ge) Hitt) SNe eo = ")) 


&<N a 


vB log N vee") , (6.28) 


< LN exp (- 
a 


and (2.6) implies that E supy—y pas ite; < La./log N/B. oO 
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We are now ready to perform the main step of the proof of Theorem 6.4.1. 


Proposition 6.4.7 Assume the hypotheses of Theorem 6.4.1 and that furthermore 


Jlogm < ; (6.29) 
Ve <m; [tells < 2a. (6.30) 


Then (6.21) holds. 


Condition (6.29) is motivated by the important idea that the critical case of 
Theorem 6.4.1 is where the two terms in the minimum in the right-hand side 
of (6.21) are nearly equal. 


Proof Consider a parameter c > 0 and define & = gj1tj¢,;>c} and &/ = gjltjo;\<cy. 
Thus, using (6.22) in the first inequality, 


—/logm < E sup >> gitei < E sup )°é/ tei +Esup > 0 gt, : (6.31) 


&<m i>] l<m i>] &<m i>] 


Now, using Corollary 6.4.6 for A = 2L, shows that if c is a large enough constant, 
we have 


a 
Esup ) &te; < — J/logm , (6.32) 
ee I i OL 
so that (6.31) implies 
— -Vlogm = E sup ) | éjte4 = cE sup ) | eite, : (6.33) 
2L £<m i>] e<m i>] 


where the last inequality is obtained by copying the argument of (6.24). Using (6.29) 
shows that min(a./logm, a’ /b) = a,/logm so that (6.33) implies (6.21). oO 


Proposition 6.4.8 The conclusion of Theorem 6.4.1 holds true if we assume 
moreover that ||t¢||2 < 2a for each £ < m. 


The improvement over Proposition 6.4.7 is that we no longer assume (6.29). 


Proof It follows from Lemma 6.3.6 and (6.19) that 


E sup = teici > — , (6.34) 


Assume first that a/b < ,/log2. Thena > a?/(Lb) and (6.34) implies (6.21). Thus, 
it suffices to prove (6.21) when a/b > ./log 2. For this, consider the largest integer 
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N <~™m for which 


Jlog N < . (6.35) 


Then N > 2. Next we prove that 


2 


a,/log N => = min (aylogm, —) ; (6.36) 


b 


Indeed this is obvious if VN = m. When N < m, the definition of N shows that 


Jlog(N + 1) > a/b, so that a,/log N > a?/(Lb), proving (6.36). Consequently, 
it suffices to prove (6.21) when m is replaced by N, and then (6.29) is satisfied 


according to (6.35). The conclusion follows from Proposition 6.4.7. oO 

Proof of Theorem 6.4.1 Let T = {t,,..., tm}, so that we want to prove (6.21), i.e., 
1, a’ 

b(T) = > min (ay/log m, —) ; (6.37) 


We have proved in Proposition 6.4.8 that when furthermore we have ||f¢||2. < 2a for 
each @, then 


2 


ee (aylogm, —) (6.38) 
Lo 


b 
To prove (6.37), we may assume that 


2 


b(T) < 
(1) = orb 


(6.39) 


because there is nothing to prove otherwise. Consider a point tf € T and an integer 
k > 0. The proof then relies on a simple iteration procedure. Assume that in the ball 
Bt, 2ka), we can find points u1,...,uy € T with d(ue, ug) = 2k-lq whenever 
£# £’. Wecan then use (6.38) for the points uj —t,..., uy —t, with 2k-1q instead 
of a and 2b instead of b to obtain 


Q2k—2 2 


1 
b(T) > — min (2 "a yiog N, —~) 
Lo 2b 


Using (6.39), this implies that Lb(T) > 2*a./logN. Hence, N < My := 
exp(L2~**b(T)* /a’). Thus any ball B(z, 2ka) of T can be covered using at most 
M;, balls of T radius 2‘~!a. We then iterate this result as follows: Consider any 
number ko large enough so that T C B(t, 2'0q) for a certain t € T. Then we can 
cover T by at most M;, balls centered in T of radius 2'0-la, Each of these balls 
can in turn be covered by at most M;,_1 balls of T of radius 20-2, so that T 
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can be covered by at most Mj, Mz,—1 such balls. Continuing in this manner until 
we cover T by balls of radius a/2 requires at most [[,.¢ Mx balls of radius a/2. 
Since te ¢ B(ty,a/2) for £ # &’, these balls of radius a/2 contain a single point 
of T, and we have shown that m < [],.9 Mk < exp(Lb(T)?/a’), Le., b(T) = 


a,/logm/L. Oo 


Combining our results, we may now prove a version of Proposition 2.10.8 for 
Bernoulli processes. 


Proposition 6.4.9 There exist constants L; and L2 with the following properties. 
Consider numbers a,b,o > OQ, vectors t},...,tm € £2, that satisfy (6.19) 
and (6.20). For € < m, consider sets He with He C B2(te,o). Then 


1 : a ‘ 
(U Ht) > ymin (a//logm, —) — Lrologm + min (Hi). (6.40) 


<m 


The proof is identical to that of Proposition 2.10.8, if one replaces Lemmas 2.10.2 
and 2.10.6, respectively, by Theorems 6.4.1 and 6.3.1. 


Corollary 6.4.10 There exists a constant Lo with the following property. Consider 
a set D with A(D, doo) < 4a/./logm, and for € < m, consider points te € D 
that satisfy (6.19), i.e., \lte — te|l2 = a for € # &'. Consider moreover sets He C 
Bo(te, a/Lo). Then 


b( U Ht.) > © flogm + minb(Hi) . (6.41) 
ae Lo l<m 


Proof Since b(T — t;) = b(T), we may assume without loss of generality that 
t; = 0. Since A(D, doo) < b := 4a/./logm, we have ||fello. < b forall £ < m, 
and (6.40) used for o = a/Lo gives 


1 L 
b( U Ht) > —a,/logm — ae /logm +minb(Hp) , 
4L, Lo e<m 


l<m 


so that if Lo > 8L;L2 and Lo > 8L\ we get (6.41). oO 


This corollary will be an essential tool to prove the Latata-Bednorz theorem. 


6.5 Comparison Principle 


Our last fundamental result is a comparison principle. Let us say that a map 6 from 
R to R is a contraction if |O(s) — O(t)| < |s —t| foreach s,t € R. 
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Theorem 6.5.1 For i > 1, consider contractions 6; with 0;(0) = 0. Then for each 
(finite) subset T of 07, we have 


E sup Seid (t;) < b(T) = = sup So eiti : (6.42) 
te 


tel js] i>1 


A more general comparison result may be found in [112, Theorem 2.1]. We give 
here only the simpler proof of the special case (6.42) that we need. 


Proof The purpose of the condition 6; (0) = 0 is simply to ensure that (6; (t;)) € 02 
whenever (t;) € ¢*. A simple approximation procedure shows that it suffices to 
show that for each NV, we have 


E sup >. €;0;(t;) < Esup > ej; . 


teT jcjen teT ycjcn 


By iteration, it suffices to show that Esup,cr )>)-;<, €iti decreases when ty is 
replaced by 6; (t,). By conditioning on €2, €3,..., €y, it suffices to prove that for a 
subset T of R? and a contraction @, we have 


E sup (€)0()+m)<E sup (eit) +h). (6.43) 


t=(t,to)ET t=(t,fo)ET 


Now 


2E sup) (€10(t1) + 2) = sup (6(s,) +53) + sup(—O(s1) + 52) . 
s'eT T 


t=(t,t2)€T sE 
Thus to prove (6.43) it suffices to show that for s, s’ € T, we have 


O(s}) +85 —O(s1) +52 <2E sup (eit1 +h). (6.44) 


t=(t1,t2)€T 


To bound the right-hand side from below, we may take either t = s’ when ¢; = 1 
and t = s when ¢; = —1 or the opposite: 


2E sup) (e€1f) + tr) 


t=(t1,fo.)ET 
> max(s} +94 — sy +2, 51 + 52-3} +55) =sots,+ Is} — sil, 
so that (6.44) simply follows from the fact that 6(s}) — 6(51) < Is} — s,| since @ is 
a contraction. oO 
Corollary 6.5.2 For each subset T of £2, we have 
E sup | Ye eiltil| < 2E sup | >> eiti| : 
teT teT 


i=l i=l 
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Proof Writing x* = max(x, 0), we have |x| = x* + (—x)* so that by symmetry 


Esup| 0 eiltil| = 2E sup (> eilti|)* =2E sup ) > ei|ti| 
teT teT 


i>] i>] tel is] 


where T’ = T U {0}. Now, using (6.42) in the first inequality, we have 


E sup ) ell < Esup ) ait; < Esup| ) siti oO 


tel’ js} i>l i>l 


6.6 Control in 2° Norm 


Bernoulli processes are much easier to understand when the index set is small in the 
supremum norm. The main result of this section goes in this direction. It is weaker 
than Theorem 6.2.8 (the Latata-Bednorz Theorem), but the proof is much easier. 


Theorem 6.6.1 There exists a universal constant L such that for any subset T of 
07, we have 


y(T) < L(b(T) + Vb) N1(T, doo)) - (6.45) 


Corollary 6.6.2 We have 


2, 
2) (6.46) 


Proof Denoting by L* the constant of (6.45), if b(T) < y2(T)/(2L"*), then (6.45) 
implies 


1 
b(T) = = min (72(7), 


y2(T) < y2(T)/2+ L*Vb(T) yi (T, doo) , 


hence b(T) > y2(T)?/4(L*)? v1 (T, doo). Oo 


Exercise 6.6.3 Find examples of situations where y2(T) « y1(T, doo) and b(T) is 
of order yu(T)*/n(T, doo), not y2(T). Hint: Try cases where t; € {0, 1} for each i 
and each ft. 


We recall the constant Lo of Corollary 6.4.10. The next lemma is our main tool. 


Lemma 6.6.4 Consider a number r > 2Lo. Consider BoC €? such that 
A(B, doo) < 4a/./logm. Then we can find a partition (Ac)e<m of B into sets which 
have either of the following properties: 


A(Ag, dz) < 2a, (6.47) 
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or else 
te Ag > (BN Bolt, 2a/r)) < b(B) - —vlogm . (6.48) 
0 


Proof The proof is almost identical to that of Lemma 2.9.4, using now Corol- 
lary 6.4.10. Consider the set 


ce {t € B; b(BM By(t,2a/r)) > b(B) — = vlegm| 
0 


Consider points (t¢)¢<m in C such that do(te, ty) > a for £  ¢’. Since 2/r < 
1/Lo, using (6.41) for the sets He := BM B(te, 2a/r) shows that 


a . 
b(B) > b( U He) > fF Ylogm! + min (BN Ba(t,2a/r)) 


l<m 
& = viogn! + b(B) — = viogm (6.49) 
0 0 


and thus m’ < m. Consequently by Lemma 2.9.3(b), we may cover C by m’ < m 
balls Be of radius a. We then set Ag = CN (Be \ Ucree Ber) for £ < m’, Ae = @ for 
m' <£<mand Am = B\C. oO 


Proof of Theorem 6.6.1 The reader should review the proof of Theorem 2.9.1 now. 
We fix r as in Lemma 6.6.4. We consider an integer t > 2 to be specified later and 
an admissible sequence of partitions (D,) of T such that 


sup ) | 2" A(Dn(t), doo) $ 2v1(T, doo) - (6.50) 


teT 150 


By induction over n, we construct an admissible sequence (A,,) of partitions of T, 
and for A € Ap, an integer j,(A) € Z such that A(A, d) < 2r—A). For n < Tt, 
we set A, = {T}, and for A € Ap, we set j,(A) = jo where jo is the largest integer 
with A(T, do) < 2r7. 

Assuming now that A, has been constructed for somen > t, with card A, < Nn, 
we consider B € Ay, and we proceed to partition it. First we partition B into the 
elements BM D for D € D,_1. Consider such a set BN D. 


First Case Assume that 


Qr—in(B)-1 


A(D, do) < —===. . 
a Vlog Nn—z 


(6.51) 


We may then apply Lemma 6.6.4 with a = r~/(8)—! to partition BN D into 
sets (Ag p.p)e<n,_, Such that either A(Ag.p.p,d2) < Qr—in(B)-1 (and we then 
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set jnt1(Ae,B,p) = jn(B) + 1) or else 


: 1 , 
t € Ace.p > D(BNDN Bolt, 27" )) < b(BND) -— Tadd 
0 


and we then set jn41(Ae,p,p) = jn(B). In that second case, we have in particular 


; 1 , 
t € Ae.p.p > b(Ae,B,p M Balt, 2r-"))) < b(B) — ela 
0 
(6.52) 


Second Case Assume that (6.51) fails. In this case, we decide that BN D € An+1, 
and we set jn41(BN D) = j,(B). Since (6.51) fails, we then have 


Qe re Ee Ly ACD es) y (6.53) 


To sum up, the partition A,4+1 consists of all the sets BN D (with B € A,, Dé 
Dn—1) for which (6.51) fails, as well as of all the sets Ag. 3, p C BND for pairs B, D 
(with B € A,, D € Dy_-1) which satisfy (6.51). This completes the construction. 
We have as desired card An41 < NaNn—-1Nn—t < Nn Noss = Nnti- 

Let us fix t € T and set j(n) = jn(An(t)). Let a(n) = 2"/27—-J) We are going 
to prove that 


2-7/2 S a(n) < Lr(b(T) + 2-1 (T, doo) . (6.54) 


n>0 
It then follows that 
y2(T, dz) < Lr(2*?b(T) + 2-7? (T, doo)) 5 

and we finish the proof by optimization over t: if y1(T, doo) < 4b(T), we take 
Tt = 2; otherwise, we take 2* about yj (T, doo)/b(T). 

The key property of the construction is that if B € A,, A € Anii, A C B, and 
Jn(B) = jn+1(A) then either (by (6.53)), there exists D € D,—-1 with B C D and 

QED A la ED 8 17 "ACD dea) 3 (6.55) 


or else by (6.52) 
: 1 . 
te A> Dd(AN Bot, 2r-”"?)) < b(B) — ie : (6.56) 


The proof of (6.54) is nearly identical to the part of the proof of Theorem 2.9.1 
following Eq. (2.88). We let the reader prove that the sequence (a(n)) is bounded. 
Consider then the set J as provided by Lemma 2.9.5 fora = V2. It suffices to 
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prove (6.54) when the sum over n > 0 is replaced by the sum over n € J \ {0}. As 
in (2.91) forn € I \ {0}, we have j(n — 1) = j(m) and j(n+ 1) = j(m) + 1. Let us 
enumerate the elements? of J \ {O}. asin, < nz <..., so that j(mgi1) > j(ng) +1. 

We consider k > 1 (so that ny > 1), and we proceed to bound a(nx). Since 
J (ne42) > jms) +1 > j(ne) +2,, we have 


A(An,45(t)) < pF M42) < Ar Iw) -2 . (6.57) 


Let us define n = nx — 1, so that j(n + 1) = j(m), and define also B = Aj,(t), 
A = An, (1) = Ani (t), 80 that jn(B) = j(n) = f(a +1) = jg) = jne1(A). We 
know that either (6.55) or (6.56) hold. If (6.56) holds, we conclude by (6.57) that 
Anpt2(t) C AN Bo(t, 2r—”)-2) so that 


27*/2a(ng) < Lr(b(An(t)) — (Ang. (1))) - (6.58) 
If, on the other hand, (6.55) holds, we obtain 
a(ng) < Lr2"~7/? A(Dn_1(t), doo) - (6.59) 


As in Theorem 2.9.1, summation of these inequalities concludes the proof 
of (6.54). o 


6.7 Peaky Parts of Functions 


One basic idea underlying the Bernoulli conjecture is that a sequence (f;) has 
a “spread out part” and a “peaky part”. The rv.s }°;., eit; are controlled by 
comparison with Gaussian processes for the spread out parts and by taking absolute 
values for the peaky parts. The notions of “spread out” and “peaky” parts refer to the 
space ¢7(N). In this section, we study how to perform the same decomposition for 
functions in L? (v), where v is a positive measure (which need not be a probability). 
The case where the measure space is N and v is the counting measure v(A) = card A 
is the previous case of £?(N). In this section, the distances dz and dogo refer to the 
distances induced by the norms in L?(v) and L®(v). 

For a single function, it is quite obvious what to do, and this is spelled out in the 
next lemma. 


5 We assume here that / is infinite, leaving the necessary simple modifications of the argument 
when / is finite to the reader. 
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Lemma 6.7.1 Consider f € L?(v) and u > 0. Then we can write f = fi + fr 
where 


2 
If ls 
Ifill2 <Wfllo, Iilloo <u; Ilfall2<Wflle. Ilha s a 


(6.60) 
Proof We set f; = fl, f\<u}, so that the first part of (6.60) is obvious. We set 
ho = Fg pisuy = f _ Sis so that 


ull fall = fuiriturend s f Pev= ae 0 


Matters are very much more difficult when one deals with a class of functions 
and where the goal is to simultaneously decompose all functions in the class. Our 
top-of-the-line result in this direction is surprisingly sophisticated, so we start here 
by a simpler, yet non-trivial result. This result has its own importance, as it will be 
used to study empirical processes. We denote by B; the unit ball of L!(v). 


Theorem 6.7.2 Consider a countable set T C L?(v) and a number u > 0. Assume 
that S = y2(T, dz) < oo. Then there is a decomposition T C T; + Tz where 


y2(T1, dz) < LS; yi(TM, doo) < LSu (6.61) 
LS 
y2(T2,d2) < LS; Thc — Bi ; (6.62) 


Here as usual 7; + 72 = {ti +t; tf ET,,b€ Tp}. The sets 7; and 7> are 
not really larger than T with respect to y2. Moreover, for each of them, we have 
some extra information: we control 7; (71, do), and we control the L! norm of the 
elements of T>. In some sense, Theorem 6.7.2 is an extension of Lemma 6.7.1, which 
deals with the case where T consists of a single function. 

We will present two proofs of the theorem: The first proof is easy to discover, as it 
implements what is the most obvious approach. The second proof, while somewhat 
simpler, is far less intuitive. It is the second proof which will be useful in the long 
run. 


Proof The idea is simply to write an element of T as the sum of the increments 
along a chain and to apply Lemma 6.7.1 to each of these increments. We will also 
take advantage of the fact that T is countable to write each element of T as the sum 
of the increments along a chain of finite length, but this is not an essential part of 
the argument. 

As usual, A2(A) denotes the diameter of A for the distance dz. We consider an 
admissible sequence of partitions (An)n>0 with 


sup )) 2"? Ap(An(t)) < 2y2(T, do) « (6.63) 


teT 150 
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For each n > 0 and each A € Ay, we are going to pick a point t,,4 € A. It 
will be convenient to ensure that each point of T is of the type t,,4 for a certain n 
and a certain A € A,. To ensure this, we enumerate T as (tn)n>0. For A € An, we 
choose for t,,4 € A any point we want unless A = A,(t,), in which case we choose 
th,A = tn. Fort € T andn > O, let us define z,(t) = t,,4 where A = A,,(t). Thus, 
if t = f,, then A = A, (t,), and by construction, 7, (t) = ta = tn. Forn > 1, let 
us set fin = n(t) — Mn—1(t). Thus 


II fi.nll2 < A2(An-1@) . (6.64) 


Moreover, ft, depends only on A, (t): if An(s) = An(f), then Ay—1(t) = An—1(s) 
and fin = fs.n. Thus as ¢ varies in 7, there are at most N,, different functions fin. 
Using Lemma 6.7.1 with 2~"/?u|| f;.n||2 instead of u, we can decompose fin = 
cee Tr i where 


Wfenll2 <Wfnll2, Wfinlloo <27ull frsnll (6.65) 


n 


gn/2 
lft anll2 - (6.66) 


2 2 
Ifenll2 < Wfenlla » Wall Ss > 


To construct the sets T! and T?, given t € T, we set 2a = fo,r and oe = 0, while 
ifn > 1, we set 


1 1 2 2 
Stn = lor + » Sik > Stn = > Sik 


1l<k<n 1l<k<n 
We set 
1 da : 2 2 «4 
l= (ego men, eT ls T=1{e 4 msn, tet), 


so that the sequences (T7;!)n>1 and (7,7)n>1 are increasing. We set 


A= | Js Be | la. 


n>0 n>=0 


We prove now that T C T; + 7». Indeed, if t € T, then t = ¢, for some n and we 
have arranged that then z,,(t) = t. Since mo(t) = fo,r, we have 


t—tor=mat)—mot)= >) me) —m10 


l<k<n 
= os t= > tar » fie» 


1<k<n 1<k<n 1<k<n 


so that tf = oe +27,€TN1+T. 
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We now start the proof of (6.61). Since for 7 = 1, 2 the element gf , depends only 
on A,(t), we have card T;! < No+---+ Nn, so that card tT = | and card Ti! < 
Nn+1. Consider t! € T;, so that r! = Br for some m and some t € T.Ifm <n, 
we have t! = Sim €E T} so that do(t!, T}) = 0. If m > n, we have orn € T!, so 


n? 


that, using in succession the first part of (6.65) and (6.64) in the third inequality, we 
get 


dy(t", Ty) < do(8} ms Stn) = Wim ~ Sinll2 < d_Wfeell2 < > A2(Ar-1 0) - 


k>n k>n 


(6.67) 


Hence, using (6.63) in the last inequality, 


yrds Dy 2"? Ar(Ar-1(0)) 


n>0 n>0,k>n 


< L124? Ao(Ak-1()) < LS 
k>1 


It then follows from Proposition 2.9.7 that y2(T;,d2) < LS. The proof that 


y2(T2, dz) < LS is identical, using now the first part of (6.66) rather than the first 
part of (6.65). 
To control y; (71, doo), we use the same approach. We replace (6.67) by 


doo(t", Tr) $ deol Stim: Bin) = >, llfrlelloo < >) 2 -*?uAr(Ax-1@) . 


k>n k>n 


Hence 
Yi 2"doolt!, Tr) <u D> 2" *P Ar(Ar-1@) 
n>0 n>0,k>n 
< Lu) 124 Aa (Ag-i(t)) < LuS , 


k>1 


and it follows again from (a suitable version of) Proposition 2.9.7 that y1(T1, doo) < 
LSu. Finally, (6.66) and (6.65) yield 


4 ; ane LS 
Isiallt < DoW feel SD) ——A2(Ar1@) < —, 
k>1 k>1 


so that 77 C LSB, /u. This completes the proof. oO 


Later, in Theorem 9.2.4, we will prove a far-reaching generalization of Theo- 
rem 6.7.2 with sweeping consequences. Our proof of Theorem 9.2.4 will be based 
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on a slightly different idea. To prepare for this proof, we give an alternate proof of 
Theorem 6.7.2 based on the same ideas. 


Second proof of Theorem 6.7.2 We keep the notation of the first proof, and we 
construct the points 7, (t) as we did there. For t € T, we define 6,(t) := A(An(t)), 
so that super )oys0 2"/°§,(t) < LS. We denote by 92 the underlying measure 
space. Given t € T and w € 92, we define 


m(t, w) = inf {n =O; |ztn41(t)(@) — mn(t)(@)| = u2-"/75,(t)} (6.68) 


if the set on the right is not empty and m(t,@) = oo otherwise. Forn <n’ < 
m(t, @), we have 


In (t)(o) — OOS Y> Itppi@@)-—mpOO| su D> 2-P/5pQ). 


n<p<n' n<p<n' 


In particular when m(t,@) = ov, the sequence (z,(t)(w)) of real numbers is a 
Cauchy sequence and hence convergent. When m(t, w) = 00, we define 


to) = lim mm (t)(@) « 
When m(t, w) < oo we define 

t'(@) = Ami(t,0) ON) - 
We define 

r=r—t!: T| = {t!; teT}; To := {t?; teT}, 

and we proceed to prove (6.61) and (6.62). We define 

1, (®) = Tn(t,o)n @)(@) - 
To match with the previous notation for n > 1, let us define 


Fyn (@) 1= ty (@) — th (®) = Tn(,e)an@)(@) = Tme,a)an— (1) () 


= (nt) (@) — Tn-1 0) (@))Umir,)>n} @) 5 


so that [lfinll2 <= ACAn1)) = 6-1) and Iffnlleo < u2°"*Y5, 10) 
because |, (t)(@) — tn—1(t)(@)| < u2"t)/25,_1(t) when m(t, @) > n. Also 


= iG ee ie The proof of (6.61) is then just as before. Let us now define 


ty = OU m(t,)=n) = (t — Tn (t))Umct,)=n) - (6.69) 
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On the set {m(t,w) = co}, we have t = t! a.e. for the measure v since |t — 
y(t) ||2 — 0 asn — oo. Thus, on that set, we have ?=0 a.e., and consequently 
a.e. we have 


CaTlingjee =e = 2 Vim(t,.)=n} = . 


n>0 n>0 


Since ale < |/t — mn(t)|l2 < 6n(t), the proof that y2(7Tr, d2) < LS is as before. 
Furthermore, using (6.69) and the Cauchy—Schwarz inequality, we have 


Ite lla < Ile — ta) llaV vm, ) = n}) < bn vCm(t, -) = n}) - 


Since |n+41(t) —2n(t)| > u2~"/*8(n) on the set {m(t, -) = n}, Markov’s inequality 
yields 


lzn+1(t) = THIS nt)? an 


v({m(t, -) =n}) < —ua-"25, (Hz = 2-"25, (DY Ww 


’ 


and thus ||t?||1 < 2”/76,(t)/u and hence ||t7 ||) < LS/u. o 
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Throughout this section, we consider a probability space (2, jz) and (to avoid well- 
understood measurability problems) a countable bounded subset of L? (42), which, 
following the standard notation in empirical processes theory, we will denote by F 
rather than T. (Since F is countable, there is no need to really distinguish between 
actual functions on §2 and classes of functions in L?().) To lighten notation, we 
set 


Uf) = / fdp. 
Consider i.i.d. r.v.s (X;)j>1 valued in £2, distributed like jz and 
Sy(F) = E sup | )°(f(X%)) - u(f))| - (6.70) 
fe i<N 


We have already seen in Chap. 4 the importance of evaluating such quantities. As in 
the case of Bernoulli processes, there should be two different reasons why the sum 
Sy (F) should be small: 


¢ On the one hand, there may be cancellation between the different terms. 
* On the other hand, it might happen that the sum )7,—, | f (Xi) —(/))| is already 
small without cancellation. 7 
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More specifically, we have the inequality 


Sy(F) < 2E sup DOF (XI. (6.71) 


To see this, we simply write 


Sv(F) < E sup )° | f (Xi) — w(P)| SE sup )O F(X) +. N sup MCA) 
fe 


feF cn JES i<N 


and we observe that the first term in the previous line is > the second term through 
Jensen’s inequality. 
We may also bound Sy (F) using chaining as follows: 


Proposition 6.8.1 [f0 € F, we have 
Sn(F) < L(VNy2(F, do) + vi(F, doo)) , (6.72) 


where dz and doo are the distances on F induced by the norms of L? and L™, 
respectively. 


Proof This follows from Bernstein’s inequality (4.44) and Theorem 4.5.13 just as 
in the case of Theorem 4.5.16. The requirement that 0 € F is made necessary by 
considering the case where ¥ consists of one single function f (and because of the 
absolute values in (6.70)). oO 


The bound (6.71) does not involve cancellation, and is of a really different nature 
than (6.72), which involves cancellation in an essential way through Bernstein’s 
inequality. 

Having two completely different methods (6.72) and (6.71) to control Sy (F), we 
can interpolate between them in the spirit of (6.9) as follows: 


Proposition 6.8.2 Consider classes F,F, and Fz of functions in L?(), and 
assume that F C F, + Fo. Assume that 0 € Fy. Then 


Sy (F) = E sup | $0 (F(X) — wf) < L(WN 2A, da) + 11 (F1, doo)) 


feF j<n 


+2E sup )°if(XiI- 


feF2j<n 


Proof Since F C F, + Fo, it is clear that Sy(F) < Sw(F 1) + Sy (F2). We then 
use the bound (6.72) for the first term and the bound (6.71) for the second term. O 


It seems worth repeating what we have just said, as this is going to be a major 
theme of this work. Given a sum of random functions, depending on a parameter, 
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there are two fundamentally different methods to bound the supremum of this sum 
over the parameter: 


¢ We may use chaining. 
¢ Or we may forget about possible cancellations and bound the sum of the random 
functions by the sum of their absolute values. 


It is a rather extraordinary fact that in a wide range of situations, there is no other 
way to bound the sum of random functions than interpolating between these two 
methods (just as we did in Proposition 6.8.2). This will be proved in Chap. 11. 

Our first occurrence of this extraordinary fact is that there is no other way 
to control Sy(F) than the method of Proposition 6.8.2.° We formalize this 
fundamental result as follows: 


Theorem 6.8.3 (The Fundamental Theorem of Empirical Processes) Consider 
a class F of functions in L?() with w(f) = 0 for f € F and an integer N. Then 
we can find a decomposition F C Fy + F2 with 0 € F, such that the following 
properties hold: 


L 
y2(F1, dz) < ae , 
Vi(F1, doo) < LSy(F) , 


E sup )> | f(Xi)| < LSn(F). 
feFrj<n 


We are not ready yet for the proof of this result, which is delayed until Chap. 11.7 


Exercise 6.8.4 We say that a countable class F of functions is a Glivenko-Cantelli 
class if 


lim E 


Sy (F) 
sup | ——— 
N00 feF N 


1 : 
yk) — H)| = Jim 0. 


i<N 


Assuming that F is uniformly bounded, prove that F is a Glivenko-Cantelli class 
if and only if for each € > 0, one can find a decomposition F C F, + F2 and an 
integer No such that F; is finite and 


1 
N>No=>E sup Wy If (kil se. 


feF2™ jen 


6 Tn particular Bernstein’s inequality suffices to perform the chaining. 
7 A good path to get a feeling for this theorem is to study Exercise 14.2.3 in due time. 
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Hint: Use Theorem 6 of [99] for the “only if” part. Warning: You need good 
technique to succeed. 


Key Ideas to Remember 


Bernoulli random variables (i.e., independent random signs) are among the 
most important probability structures. Their linear combinations satisfy the 
fundamental sub-Gaussian inequality (6.2). 

A Bernoulli process is always smaller than the corresponding Gaussian process. 
A Bernoulli process can however be bounded in a trivial way without using 
cancellation. 

One may interpolate between the two previous methods to bound a Bernoulli 
process. That this interpolation provides the best possible method of bounding a 
Bernoulli process is the fundamental Latata-Bednorz theorem. 

Bernoulli processes satisfy concentration of measure properties which are even 
better than those of Gaussian processes. 

Bernoulli processes satisfy a suitable version of Sudakov minoration, which 
however requires a control in the supremum norm. 

Bernoulli processes satisfy a fundamental comparison principle: contracting the 
coefficients decreases the size of the process. 

Elements of a not too large set of functions on a measure space can be split in 
their “peaky part” and their “spread out parts”, an idea which we will push very 
far. 

When one looks at discrepancy bounds for classes of functions in the spirit of the 
Latata-Bednorz theorem, one is lead to formulate amazing conjectures, which 
will turn out to be true as we will prove later. 


6.9 Notes and Comments 


A rather different proof of Proposition 6.4.8 is given in [113]. Probably the proof 
of [113] is more elegant and deeper than the proof we give here, but the latter has 
the extra advantage of showing the connection between Proposition 6.4.8 and the 
Marcus-Pisier theorem, Theorem 7.4.2. 


Chapter 7 ® 
Random Fourier Series Sess 
and Trigonometric Sums 


The topic of random Fourier series illustrates well the impact of abstract methods, 
and it might be useful to provide an (extremely brief) history of the topic. 

In a series of papers in 1930 and 1932, R. Paley and A. Zygmund [78-80] raised 
(among other similar problems) the question of the uniform convergence of the 
random series 


So agen exp(ikx) (7.1) 


k>1 


over x € [0, 277], where a, are real numbers and ¢; are independent Bernoulli r.v.s 
(and where i? = —1). Considering the numbers Sp defined by cr = Vopepenpti de 
they prove in particular the necessity of the condition )* pSp <&%. Later, R. Salem 
and A. Zygmund [94] proved that if the sequence (s,)»>0 is non-increasing, then, 
conversely, the condition }> pSp < suffices for the uniform convergence of the 
random Fourier series. The combination of these two results is remarkably sharp, 
but certainly does not characterize the series (7.1) which converge uniformly. 

The discovery by X. Fernique that Dudley’s bound could be reversed for station- 
ary Gaussian processes [32] was a major progress, with considerable influence. The 
Dudley-Fernique characterization of boundedness of stationary Gaussian processes 
opened the way for M. Marcus and G. Pisier to find necessary and sufficient 
conditions for uniform convergence of a large class of random Fourier series. The 
conditions of Marcus and Pisier are of the type y2([0, 277], d) < oo for a certain 
distance d, and it is a non-trivial task (which is thoroughly performed in [61] and 
will not be repeated here) to show that they improve on the “classical” results of 
Paley, Salem, and Zygmund. The results of [61] cover not only the case of series of 
the type (7.1) but more general cases such as the series 


So ange exp(ikx) (7.2) 
k>1 
© Springer Nature Switzerland AG 2021 201 
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where the independent symmetric r.v.s & satisfy sup, (E&?)!/ 2 (E|E|) < 00 (and 
many other situations). 

The work of Marcus and Pisier on random Fourier series was extended by Marcus 
[59] to more general situations (that involve the infinitely divisible processes that 
we will study in Chap. 12). Marcus fails however to obtain necessary and sufficient 
conditions. Obtaining these requires the new idea of “families of distances” which 
we develop in Sect. 7.5. In the present chapter, we provide (in a far more general 
setting) what is in a sense the final result, necessary and sufficient conditions for 
the almost sure convergence of the series (7.2) assuming only that the r.v.s && are 
independent symmetric. 

In retrospect it might be hard to understand why the topic of random Fourier 
series was so popular at one point. What is certain is that the interest had already 
waned when the author performed his work, and it is doubtful that this work has yet 
found even a single reader. 

Why, then, should you bother to even read a single line of the present chapter? 
First, the title of the chapter is somewhat misleading. The main focus of it is not 
to decide whether certain series converge or not, but to provide upper and lower 
bounds on the supremum norm of certain random trigonometric sums.! When one 
has obtained upper and lower bounds which are sufficiently close to each other, it 
is hardly more than an exercise to obtain necessary and sufficient conditions for 
the convergence of random Fourier series. This exercise is carried out in Sect. 7.10. 
Why, then, should you be interested in random trigonometric sums? The reason is 
very simple. In general, the study of non-Gaussian processes (and in particular the 
search for lower bounds) is an order of magnitude harder than the Gaussian case, 
because all kinds of difficulties occur simultaneously. One of them (which already 
occurs in the Gaussian case) is the need to use “generic chaining ideas” because 
covering numbers do not suffice. A fundamental feature of random trigonometric 
sums is that this specific difficulty does not occur: in contrast with the case of general 
processes, covering numbers do suffice. Random Fourier series provide a simple 
setting where we can first learn to face the difficulties inherent to non-Gaussian 
processes, before we face these difficulties in the much harder context of the generic 
chaining. 

The reason why covering numbers suffice to study random Fourier series is 
that the distances involved on the underlying group are translation-invariant. As 
we will learn in the next section, for translation-invariant distances, not only 
covering numbers suffice, but these covering numbers can basically be computed 
by evaluating the Haar measure of certain small balls. This is a tremendous 
simplification (which explains why very precise results can be proved). And how 
could we dream of making progress on general processes if we do not thoroughly 


' These objects will be defined precisely in a few pages. 
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understand this much simpler case first? This is why the author concentrated his 
efforts for many years on random Fourier series. The strategy paid off: this setting 
turned out to be ideal to invent some of the fundamental tools on which the 
subsequent chapters are built and first of all the concept of a family of distances. 
Furthermore, the ideas used in the present chapter to obtain lower bounds on random 
Fourier series will be given a sweeping generalization in Chap. 11, and this will shed 
considerable light on the structure of several fundamental processes. So, the reader’s 
main motivation need not be the results on random Fourier series per se, but the ideas 
she will learn from studying them. In fact, successfully reading the rest of Part II 
probably requires reading the present chapter up to Sect. 7.7 inclusive. 

We start the chapter by investigating the central structure, translation-invariant 
distances. Our first basic results on random Fourier series are proved in Sects. 7.2— 
7.4. The main results are stated in Sect.7.5, where the concept of a family of 
distances is also introduced. 


7.1 Translation-Invariant Distances 


The superiority of the generic chaining bound (2.59) over Dudley’s entropy 
bound (2.38) is its ability to take advantage of the fact that the metric space (T, d) 
(where the distance d controls the increments of the process as in (2.4)) need not be 
“homogeneous” in the sense that at a given scale different regions of the space may 
look very different from each other. When, however, this is not the case, the situation 
should be simpler and Dudley’s bound should be optimal. A typical such case is 
when T is a compact metrizable Abelian group’ and d is a translation-invariant 
distance 


Vs,t,veT, d(s+u,t+v)=d(s,t). 


We denote by wu the normalized Haar measure of T, that is, u(T) = 1 and w is 
translation-invariant. Thus, all balls for d with a given radius have the same Haar 
measure.* 

To study the size of the space (T, d), it is very convenient in this setting to use 
as a “main parameter” the function € } j(Bg(0,€)). We recall that we defined 


No = land N, = 22" forn > 1. 


2 At the expense of minor complications, the same methods cover the case where T is subset with 
non-empty interior of a locally compact group. 

3 If you find this setting too abstract, you may assume that T is R/(27Z). The proofs are identical, 
but you will soon realize that cluttering your mind with irrelevant information makes things harder, 
illustrating the wisdom of the advice of Gustave Choquet given on page 1. 
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Theorem 7.1.1 Consider a continuous* translation-invariant distance d on T. For 
n > O define 


én =inf{e>0; w(Bz(0,¢))>2-* = Nz). (7.3) 
Then 
1 
Fn"? < f(T, d) LY fen2"?. (7.4) 
n>0 n>0 


Our first lemma shows that the numbers €, are basically the entropy numbers 
of Sect. 2.5, so that (7.4) simply states (as expected in this homogeneous case) that 
y2(T, d) is equivalent to Dudley’s integral. 


Lemma 7.1.2 The entropy numbers en(T) = en(T, d) satisfy 
€n < en(T) S 2€p . (7.5) 


This near trivial lemma has staggering consequences: the only characteristic of 
the balls Bz(0, €) which influences the entropy numbers is their measure, entirely 
irrespective of their shape. As we will explain soon, it is really child’s play to control 
this measure. 

Lemma 7.1.2 is in turn based on an even more trivial “volume argument” which 
we state separately for further use. 


Lemma 7.1.3 Consider a subset B of T. Then there exists a subset U of T with 
card U < 1/u(B) such that whenever t € T we can finds € U witht €s+ B-—B, 
where B — B = {t, — ta; t1, t2 € B}. 


Proof Any set U such that the sets s + B are disjoint for s € U satisfies card U - 
U(B) < 1 because w(s + B) = w(B). Thus exists such a set U whose cardinality 
is as large as possible. Then for each ¢ € T, there exists s € U for which (t + B) M 
(s+ B) 4 @ (for otherwise we could add the point t to U). Thentes+B—-B. O 


Corollary 7.1.4 If u(B) > 1/2 thenT = BB. 


Proof In that case card U = | so that U consists of a single point u, and t —u € 
B—B foreachtin7,sothatB-—B=T. oO 


Exercise 7.1.5 Find a direct argument. Convince yourself that the conclusion 
utterly fails when the distance is not required to be translation-invariant. 


Lemma 7.1.6 For any « > 0, we have B(O, €) — B(O, €) C B(O, 2e). 


4 That is, the function (s,f) > d(s, tf) is continuous on T2. This is the case of interest, but much 
weaker regularity properties suffice. All we really need is that the balls are measurable. 
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Proof If t,t’ € B(O, €), then d(t — t’,0) < d(t —t', -t') + d(-t',0) = d(t, 0) + 
d(0, t’) < 2e using twice that the distance is translation-invariant. oO 


The previous lemmas play a fundamental role throughout the chapter, where 
translation-invariance is constant feature. 


Proof of Lemma 7.1.2 Since y is translation-invariant, all the balls of T with the 
same radius have the same measure. Consequently if one can cover T by N; balls 
with radius €, then €, < €, and this proves the left-hand side inequality. 

To prove the right-hand side, we use Lemma 7.1.3 for B = B(0, €,). Thus we 
can cover T by at most cardU < u(B)~! < N, translates s + B — B of the set 
B— B. Furthermore s + B— BC s+ B(O, 2€,) = B(s, 2€,) by Lemma7.1.6. O 


Proof of Theorem 7.1.1 The right-hand side inequality follows from (7.5) 
and (2.56). To prove the left-hand side inequality, we consider an admissible 
sequence (A,,) of partitions of T with sup;er Ve 2"!2 A(Ay (t),d) < 2y2(T, d). 
We construct by induction a decreasing sequence C,, € A, with (Cp) = Ne as 
follows. First we choose Cg = T. Having constructed C, € A,, we note that 


Nii SMC) = UA), AC AW ACG), 


and since the sum has at most N,+41 terms, one of these is > ee — Ns Thus 


there exists A € Ani, with A C Cy, and (A) = NG We choose for C,+1 such 
a set A, completing the induction. 

Since d is translation-invariant, it follows from (7.3) that C,, cannot be contained 
in a ball with radius < €,4; and thus that A(C,, d) > €n41. 

Consider now t € Cx. For 0 < n < k, we havet € Cy, € A, so that A,(t) = Cy 
and thus 


De ent"? < YP 2"PACrd) = DY 2" A(An(t), d) < 2y2(T, d) « 


O<n<k O<n<k O<n<k 


of (7.4).° Oo 


Since €9 < A(Ao,d) this completes the proof of the left-hand side inequality 


Recalling the numbers ¢, of (7.3), it is very useful for the sequel to form the 
following mental picture: 


For our purposes, it is the number » ean 


n>0 


which determines the size of the space (T, d) . 


5 There is no reason for which the sets of A, should be measurable for jz, but our argument works 
anyway replacing “measure” by “outer measure’. 
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All we will have to do to understand very general random Fourier series is to 
discover the proper numerical series whose convergence is equivalent to the almost 
sure convergence of the random Fourier series! 


Exercise 7.1.7 With the notation of Theorem 7.1.1 prove that for a constant K 
depending only on a, fora > 1 we have 


1 
ma Yo n2"/ < Yo(T,d) < KY en2"/* (7.6) 
n=0 n=0 


The following will be used many times: 


Exercise 7.1.8 Assume that for n > 0, we have a set D, C T with uw(D,) = N7! 
and 


seD,> d(s,0)<e. 


Then y2(T, d) < LY yy392"€n. 


7.2 Basics 
7.2.1 Simplification Through Abstraction 


How should one approach the study of the random series (7.2)? To study the uniform 
convergence of such a series for x € [0, 277], we will have to control quantities such 
as 


sup | & exp(ikx)| ‘ 


O<x<20 kh 


Let us observe that f := exp(ix) is a complex number of modulus | and that 
exp(ikx) = t* so the above quantity is 


sup | » ea" 


teU k<n 


’ 


where U is the set of complex numbers of modulus 1. Provided with the multiplica- 
tion, U is a compact metrizable group. The functions x; (t) = t* have a very special 
property with respect to the group operation: x; (st) = xx(s) xx(t).° 

These remarks suggest to think of the series (7.2) as a series >... Exxe(t) 
of random functions on U. This abstract point of view is extremely fruitful, as 


© Tf you like big words, they are group homomorphisms from U to U. 
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otherwise it would be impossible to resist the temptation to use the special structure 
of the set [0,2] and its natural distance.’ It took indeed a very long time to 
understand that this natural distance is not what is relevant here. 


7.2.2 Setting 


We consider a compact metrizable Abelian group T.° Since T is Abelian, we follow 
the tradition to denote the group operation additively.” A character x on T is a 
continuous map from T to C such that | x (t)| = 1 foreach t and xy (s+t) = x(s)x(t) 
for any s,t € T.'° In particular x (0) = 1. Under pointwise multiplication, the set of 
characters on T form a group G called the dual group of T. The unit element of this 
group is the character 1 which takes the value 1 everywhere. If this abstract setting 
bothers you, you will lose nothing by assuming everywhere T = U and G = Z,!! 
except that the extra information will needlessly clutter your mind. 

For our purpose, the fundamental property of characters is that for s,t,u € T, 
we have x(s +u) — x(t +u) = x;(u)(x(s) — x(£)) so that 


Ix(@s+u)—x¢+wl=|x@)-x@l. (7.7) 


Taking u = —t in (7.7) we get 


Ix(s)— xO =|Ix@—1t)— 1]. (7.8) 


A random Fourier series is a series 


ax 


i=l 


where &; is a complex-valued r.v. and x; is a (nonrandom) character. We assume 
that (§;);>1 are symmetric independent r.v.s. It is not required that x; # xj; for 
i ~ j, although one may assume this condition without loss of generality. We will 
study the convergence of such series in Sect. 7.10, and for now we concentrate on 
the central part of this study, i.e., the study of finite sums 7; & x;, which we call 
random trigonometric sums. Thus finite sums are denoted by }7,, as a short hand 


7Let us remember that going to an abstract setting was also a very important step in the 
understanding of the structure of Gaussian processes. 


8 Tt requires only minor complications (but no new idea) to develop the theory in locally compact 
groups. 


° In contrast with the case of the group U where multiplicative notation is more natural. 
10 So that x is simply a group homomorphism form T to U. 
UtokeZ corresponds the character x, (t) = tk, 
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for )°;<,; where J is a set of indices, whereas infinite series are denoted by }°;~ 1. 
We denote by || - || the supremum norm of such a sum, so that 


| Y> bixi = aD | Fa) ; 
i i i 


This notation is used throughout this chapter and must be learned now. Our ultimate 
goal is to find upper and lower bounds for the quantity || }°; & x;|| that are of the 
same order in full generality, but we first state some simple facts. 


7.2.3 Upper Bounds in the Bernoulli Case 


Particularly important random trigonometric sums are sums of the type )°; aj& Xi, 
where a; are numbers, x; are characters, and ¢; are independent Bernoulli rv.s./2 
For such a sum let us consider the distance d on T defined by 


d(s,t =o lai lxi(s) — OP (7.9) 


which is translation-invariant according to (7.7). 


Exercise 7.2.1 Convince yourself that in the case where T = U is the group of 
complex numbers of modulus 1, there need not by a simple relation between the 
distance of (7.9) and the natural distance of U induced by the distance on C. Hint: 
If xi) = t' consider the case where only one of the coefficients a; is not zero. 


Proposition 7.2.2 Consider the numbers €, defined in (7.3) with respect to the 
distance d(s, t) of (7.9). Then3 


1/2 1/2 
(E| So aieixi |’) — Ly ey + (So ail?) - (7.10) 
i n>0 i 


Proof Consider real-valued functions n; on T and the process 


X= Dome. (7.11) 


'? So that, in the previous notation, & = aje;. 

'3 There is nothing magical in the fact that we use the L? norm rather than the L! norm in the 
left-hand side of (7.10). We will need this once later. It is known from general principles that these 
two norms are equivalent for random trigonometric sums of type )0; ai&iXi- 
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Using the subgaussian inequality (6.2), this process satisfies the increment condi- 
tion (2.4) with respect to the distance d* given by 


d*(s,t) = Do |ni(s)— nO, 
i 
and therefore from (2.66), 


1/2 
(E sup |X, — x?) < Lyp(T, a"). (7.12) 


s,teT 


Furthermore (7.12) holds also for complex-valued functions n;. This is seen simply 
by considering separately the real and imaginary parts. When n; = a; x; we have 
d* = d, and the result follows from the right-hand side of (7.4), using also that 
(E super |X1|*)'/? < (Esup, per Xs — X11?)'/? + (ElXo/*)!? and E|Xo|? = 
> \a;|?.!4 go 


The previous result may sound simple, but it is essential to fully understand what 
was the central step in the argument, because it is the very same phenomenon 
which is at the root of the upper bounds we will prove later. This central step is 
Lemma 7.1.3: when the Haar measure of a set B is not too small, one may cover T 
by not too many translates of B — B. This uses translation invariance in a crucial 
way. 


7.2.4 Lower Bounds in the Gaussian Case 


We turn to a lower bound in the case where the r.v.s &; are i.i.d. Gaussian. It is a 
simple consequence of the majorizing measure theorem, Theorem 2.10.1. 


Lemma 7.2.3 Consider a finite number of independent standard normal r.v.s g; and 
complex numbers a;. Then 


, (7.13) 


1 
pnt. d) = E| Dd a8ixi 
l 


'4 Exactly the same result holds when we replace the independent Bernoulli r.v.s ¢; by independent 
Gaussian r.v.s. Moreover, as we will see in (7.40), it is a general fact that “Gaussian sums are larger 
than Bernoulli sums”, so that considering Gaussian r.y.s yields a stronger result than considering 
Bernoulli r.v.s. We are here using Bernoulli r.v.s as we will apply the exact form (7.10) . 
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where d is the distance on T given by 
d(s,t)” = >> lai? |xi(s) — x)? - (7.14) 
i 


We cannot immediately apply the majorizing measure theorem here, because it deals 
with real-valued processes, while here we deal with complex-valued ones. To fix 
this, we denote by tz and Sz the real part and the imaginary part of a complex 
number z. 


Lemma 7.2.4 Consider a complex-valued process (X1)teT, and assume that both 
(RXp)rer and (3X;)rer are Gaussian processes. Consider the distance d(s,t) = 
(E|X; — X;|?)!/2 on T. Then 


1 
pe: d) <E sup |X; — X;| < Ly2(T,d). (7.15) 


s,teT 


Proof Consider the distances d; and dz on T given respectively by 
dy(s, t)? = E(9(X, — X,))” 
and 
dy(s,t)? = E(3(X, — X,))’. 
Combining the left-hand side of (2.114) with Lemma 2.2.1 implies 


yo(T, dj) < LE sup |X; —RX;| < LE sup |Xs5 — X;| 


5,teT s,teT 


and similarly y2(T, do) < LE SUDPs teT |X; — X;|. Since d < dj + do, (4.55) implies 
that yo(T,d) < LEsup, ,-7|Xs — X;|. To prove the right-hand side inequality 
of (7.15), we simply use (2.59) separately for the real and the imaginary part 
(keeping Lemma 2.2.1 in mind). Oo 


Proof of Lemma 7.2.3 It follows from (7.15), since E sup, jer |Xs — X;| 


S 
2E supyer | Xz. o 


7.3. Random Distances 


7.3.1 Basic Principles 


Without giving details yet, let us sketch some of the main features of our approach 
to a trigonometric sum - &; x;. It has the same distribution as the sum yy €} 5; Xj 
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where (¢;) are independent Bernoulli r.v.s which are also independent of the ;. 
Such a random series will be studied conditionally on the values of the r.v.s &;. This 
will bring in the distance corresponding to (6.2), namely, the distance d,, (where the 
subscript @ symbolizes the randomness of the &;) given by 


do(s,t! = ls Plxi(s) — OP. (7.16) 


We will then try to relate the typical properties of the metric space (T, d,,) with the 
properties of the metric space (7, d) where 


d(s,t)’ = Ed,,(s, 0)? = DEI Ixi(s) — x OP? - Aus 


We now try to formulate in a rather imprecise way two key ideas of our approach.!° 
You should not expect at this stage to fully understand them. Real understanding will 
come only after a detailed analysis of the forthcoming proofs. Yet, keeping these 
(imprecise) ideas in your mind may help to grasp the overall directions of these 
proofs. We have already met the first principle on page 168, but it bears repetition. 


Principle A If, givens, t € T, it is very rare that the distance d,,(s, t) is very much 
smaller than d(s, t), some measure of size of (T, d) is controlled from above by the 
typical value of y2(T, d,). 


In other words, the balls Bz(O, €) cannot be too small. The reason for this is 
simple. If this were the case, since d,,(s, tf) is not much smaller than d(s, t), the balls 
Bzg,,(O, €) would also typically be very small, and then y2(T, d,,) would typically be 
very large. This principle is adapted to finding lower bounds. 

We will be able to use a suitable version of Principle A in cases where there is 
no translation invariance, and it is at the root of the results of Chap. 11. 

An equally simple principle works the other way around and is adapted to find 
upper bounds on trigonometric sums. 


Principle B If, given s,t € T, the distance d,,(s,t) is typically not too much 
larger than d(s, t), then the typical value of y2(T, d,) is controlled from above by 
y2(T, d). 


The reason is that y2(7, d) controls from below the size of the balls Bz (0, €), 
in the sense that at a given value of 72(T,d) the measure of these balls cannot 
be very small. This in turn implies that the balls Bz, cannot be too small because 
dy is typically not much larger than d. But this controls from above the size of 
y(T, dy).1® 


'5 The setting for these ideas is somewhat general that the specific situation considered above. 
'6 This last step is unfortunately specific to the case of translation-invariant distances. 
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The following theorem is an implementation of Principle B and is at the root of 
a main result of this chapter, the Marcus-Pisier theorem, Theorem 7.4.2:17 


Theorem 7.3.1 Consider a translation-invariant distance d, on T that depends on 
a random parameter w. Assuming enough measurability and integrability, consider 
the distance d given by 


d(s, t) = Ed,(s, t) . (7.18) 

Then 
Ey(T, dy) < Ly2(T,d) + LEA(T, dy) . (7.19) 
Proof It is obvious that d is a distance. Consider the corresponding numbers é, 


as in (7.3). For each n > 1 let us set B, := Bj(O, €n), so that (By) = Nv! by 
definition of €,. Let us define 


1 
(Bn) 


bn(@) = / dy (0, t)du(t) . (7.20) 
Bn 


Markov’s inequality used for the measure yz at the given w implies w({t € 
Bn; dw(O, t) => 2b,(@)}) < w(B,)/2. Consequently 


1 1 = 
W({t € Bn 5 d(O,t) < 2bn(w))) = 5H(Bn) = 5Na! = Naas > 


so that €,41(@) < 2b,(@), where of course €,+1(@) is defined as in (7.3) for the 
distance d,,. Also, €9(@) < A(T, d,), so that 


So en(w)2"? < LA(T, do) + LY) €n(@)2"”? 
n>0 n>1 


= LA(T, do) + LY) eng (@) 24? 


n>0 


= LAUT d+ » by(w)2"/? . 


n>0 


Thus (7.4) implies 


YT, dp) < LA(T, do) + LY > by(@)2"? . 


n>0 


'7 We will not need Principle A for this result, as in the special situation considered there one may 
find specific arguments for lower bounds. 
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Taking expectations yields 


Ey2(T, dw) < LEA(T, dy) + L Eb, (w)2”/* . (7.21) 


n>0 


For t € B, we have Ed,,(0, t) = d(0, t) < €, so that taking expectation in (7.20), 
we obtain Eb, (@) < €,. Thus (7.19) follows from (7.21) and (7.4). Hl 


Exercise 7.3.2 


(a) Use Lemma 7.1.3 to prove that w(A,) < 1/2, where A, = {t € T; d,(0,t) < 
A(T, dw) /4}. 
(b) Prove that the last term is not necessary in (7.19). Hint: This is harder. 


Exercise 7.3.3 Show that if T is an arbitrary metric space and d,) an arbitrary 
random metric, then (7.19) need not hold. Caution: This is not trivial. 


7.3.2 A General Upper Bound 


We turn to upper bounds, which are a rather simple consequence of the work of 
the previous subsection. Not only these bounds are interesting in their own right, 
they are a basic ingredient of the Marcus-Pisier theorem, the central result of next 
section. 


Theorem 7.3.4 Assume that the r.v.s &; are symmetric and independent and have a 
second moment, and consider on T the distance d given by 


d(s,t)’ = > Eig: 1x) — OP? - (7.22) 
Then 


E| So bixi | < Ly2(T,d) + E> EG’) : (7.23) 


If X; = 0; & xi(t), (7.22) implies E|X,; — X;|? < d(s,t)*, but it does not 
seem possible to say much more (such as controlling higher moments of the r.v.s 
|X; —X;|) unless one assumes more on the r.v.s &;, e.g., that they are Gaussian. Also, 
as we will learn in Sect. 16.8, the condition E|X, — X; iF < d(s, ty" is way too weak 
by itself to ensure the regularity of the process. Therefore it is at first surprising 
to obtain a conclusion as strong as (7.23). Theorem 7.3.4 is another deceptively 
simple-looking result on which the reader should meditate. 
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Proof of Theorem 7.3.4 We write 
Esup| > | &xi(0) < Esup| ) (ixi — & x;(0))| + E| >> &x:)| 
(7.24) 
Since for each character x we have x (0) = 1, using the Cauchy-Schwarz inequality 


we have E] )-, & xi(0)| < (E] >; &|7)!/? = (0; El& |)!” so that it suffices to 
prove that 


1/2 


E sup | YS Eixi) — Gi xi(s))| < Lyo(T, d) + L( D> EE”) (7.25) 


Since the r.v.s are symmetric, the sum )°, & x; has the same distribution as the 
sum >; €;& Xi, where the Bernoulli r.v.s ¢; are independent and independent of the 
r.v.s &, and in particular!® 


Esup| ) /Gixi() — &:xi(9))| = Esup| ¥ Veikixi(®) — exgixa(s))|- (7.26) 


For clarity let us assume that the underlying probability space is a product 92 x 
92’, with a product probability P = Pe ® Pe, and that if (w, w’) is the generic point 
of this product, then &; = &;,,, depends on w only and ¢; = &;,' depends on a” only. 
For each w define the distance d,, on T by 


di(s,t)? = > |giol? xis) — HOP , 


so that 


A(T, do)” <4) 0 |éi,ol” (7.27) 


and 


Edo(s,t)” = >) EI&I?|xi(s) — x)? = d(s, 1)". (7.28) 


18 Maybe here is a place to stress the obvious. When there are several sources of randomness, such 
as in the second term below, the operator E takes expectation over all of them. 
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The Cauchy-Schwarz inequality shows that the distance d given by d(s,t) = 
Ed,,(s, t) satisfies d < d and also from (7.27) that 


EA(T, de) <2(S EI) (7.29) 


Next, denoting by E, expectation in only, !° we use (7.12) with nj (5) = &. axils) 
(so that then d* = d,,) to obtain that for each w we have 


E. sup 
ts 


Y. Cier6ioxiO) — Fie/fiwxi(s))| < Lya(P, do) « 


Taking expectation and using (7.26) we obtain 


E sup | xi) — &xi(s))| < LET, do) - (7.30) 


The distances d,, are translation-invariant, as follows from the facts that x;(s + 
u) = xi(s)x;(u) and | x;(u)| = 1 for each i, so that (7.19) implies 


Eyo(T, dw) < Ly(T, d) + LEA(T, dy) < Ly2(T, d) + LEA(T, dw) . 


The desired inequality (7.25) then follows combining this with (7.29). A 


7.3.3 A Side Story 


Let us now start a side story and discuss the second term in the right-hand side 
of (7.23). If we consider the case where for all i, x; = 1, the character taking value 
1 everywhere, then d = 0 and this second term is really needed. This is however 
basically the only case where this term is needed, as the following shows: 


Lemma 7.3.5 Assume 
Vi; x #1. (7.31) 
Then, recalling the distance d of (7.22), 


2 VEG)? < A(T, a)? (7.32) 


'9 The idea behind the notation E, is that “we take expectation in the randomness of the ¢; only”. 
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and in particular we may replace (7.23) by 


E| So & xi] < Ld). (7.33) 


Lemma 7.3.6 Two different characters are orthogonal in L*(T, di). 


Proof Consider two such characters x and y. Then for any s € T, 
[xorosnue = [x +hy(s +du) = x(s)y(s) / XMy(C)du() . 


Thus, either x and y are orthogonal in L(T, dj) or else x(s)y(s) = 1 for all s, 
ie, xX =y. oO 


Corollary 7.3.7 For each character x 4.1 it holds that 
/ [x(s) — 1?du(s) = 2. (7.34) 


Proof of Lemma 7.3.5 We integrate in s the equality >, E|é;|7|xi(s) — xi (0)? = 
d(s, 0)? to obtain 


2 Elsi? = i: d(s,0)°du(s) < A(T, a)’. o 
F i 


We lighten the exposition by always assuming (7.31) . 


This holds even when we forget to repeat it. The only difference this assumption 
makes is that we no longer have to bother writing the term ()°; E|é;|2)!/2. For 
example, (7.23) becomes 


E| > & xi] < Ln). (7.35) 


The following exercise insists on the fact that assuming (7.31) looses no real 
information. It simply avoids writing an extra term E|&,| both in the upper and 
lower bounds. 


Exercise 7.3.8 Assume that xj, = 1 and that x; 4 1 wheni ¥ io. Prove that 


Elgigl < El] >- & xi] < Eléigl + Ly2 (7, d). (7.36) 


Exercise 7.3.9 The present exercise deduces classical bounds for trigonometric 
sums from (7.33). (If it is not obvious that it deals with these, please read Sect. 7.2.1 
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again.) We consider the case where T = U is the set of complex numbers of modulus 
1 and where x;(t) = t', the i-th power of t. We observe the bound 


ls’ — f'| < min(2, jills — 2) . (7.37) 


Let c; = E|&;|*, and consider the distance d of (7.22), d(s, t)? = 7; ci|s' — |. 
Let bo = <3 cj and forn > 1, letb, = Np <lil< Nea cj. Prove that 


yo(T, d) < L)\2"/?\/bn , (7.38) 


n>0 


and consequently from (7.23) 


El >)“ &xi| <L> 2"? Vb, . (7.39) 


n>0 


Hint: Here since the group is in multiplicative form, the unit is | rather than 0. 
Observe that d(t, 1)? < >; cj min(4, |i|*|t — 1|*). Use this bound to prove that the 
quantity €,, of Theorem 7.1.1 satisfies 2 <L cj min(1, |i 29-2""") and conclude 
using (7.4). If you find this exercise too hard, you will find its solution in Sect. 7.12. 


Exercise 7.3.10 For a trigonometric polynomial A = )°; a; x; (where the x; are all 
distinct), let us set V(A) = E|| }°; aigixi|| where the g; are independent standard 
r.v.s and ||A|| p = N(A) + ||A]]. This exercise is devoted to the proof that ||AB||p < 
L\|Allp||B||p, a result of G. Pisier proved in [83]. 


(a) Prove that the distance (7.14) satisfies d(s,t) = ||A* — A'|l2 where A°(u) = 
A(s +u) fors,x € T. 

(b) If xi) = 1 prove that |aig| < All. 

(c) Prove that ||Al| + y2(T, d) < LI|Allp < L(|All + 2(7, d)). Hint: Use (7.36). 

(d) Prove the desired result. Hint ASB’ — A'B’ = (AS — A‘) BS + A‘(BS — B‘). 
Use also (4.55) and Exercise 2.7.4. 


Exercise 7.3.11 This exercise continues the previous one. It contains part of the 
work needed to compute the dual norm of NV, as also achieved in [83]. 


(a) Prove that if a linear operator U on the space of trigonometric polynomials 
into itself commutes with translations (in the sense that (U(A))* = U(A‘*) 
with the notation of the previous exercise, then for each character x one has 
U(x) = u,x for acertain uy. (In words: U is a “multiplier’.) 

(b) Fora function f on T consider the norm 


Ilva = int fe > 0: f expr fP2/2au = 2}. 
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and for a linear operator U as in (a) let 
Wllay, = inf {C > 0; VA, ||UCAlly < CIAL} - 


Given a trigonometric polynomial A, think of X; := U(A)! as ar.v. on the space 
(T, 2). Use (a) of the previous exercise to prove that it satisfies the increment 
condition ||X; — X;|ly. < L||U |l2,y.d(s, t). Relate this to (2.4) and use (2.60) 


to prove that sup, ,<7 |U(A)(s) — U(A)(t)| < LN(A)||U llo,w- 
(c) Prove that |U(A)(0)| < LN(A)||U lz, yo. 


7.4 The Marcus-Pisier Theorem 


7.4.1 The Marcus-Pisier Theorem 


As a special case of (6.7), since E]g| = ./2/m when g is a standard Gaussian r.v., 
we get the following version of (6.6): 


El >> exil| < (Fel Y- gixil| - (7.40) 


Exercise 7.4.1 Prove that the inequality (7.40) cannot be reversed in general. More 
precisely find a situation where the sum is of length n and the right-hand side is 
about ./log n times larger than the left-hand side. 


In this subsection we prove the fundamental fact that in the setting of random 
trigonometric sums, where the Banach space is the space of continuous functions 
of T provided with the supremum norm, and when x; is the function a; x;, we can 
reverse the general inequality (7.40). As a consequence we obtain in Corollary 7.4.5 
below the estimate for E|| )0; a;¢; x; || on which all our further work relies. 

Once this difficult result has been obtained, it is a very easy matter to achieve 
our goal of finding upper and lower bounds on the quantities E|| }°; & x; || under the 
extra condition that the r.v.s “&; have L! and L* norms of the same order”. This 
assumption will later be removed, but this requires considerable work. 


Theorem 7.4.2 (The Marcus-Pisier Theorem [61]) Consider complex numbers 
aj, independent Bernoulli r.v.s €;, and independent standard Gaussian rv.s gj. Then 


El > aigixil] < LE| >> aieixi|| . (7.41) 
i i 
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Proof The argument resembles that of Theorem 6.4.1.2? Consider a number c > 0. 
Then 

E| >laigixi| <1+0, (7.42) 

i 

where 

T=E| D)aigiMigii<exi 

i 

and 


eg | D> aigil(eit>c) Xi | - 
i 


Let us define u(c) = (Ep tiisa). Consider the distance d given by (7.14). 
When & = ajgj1t\9;\>-}, we have E|é; P — |a;|7u(c)” so that the distance d’ given 
by (7.22) satisfies d’ = u(c)d. Thus 72(T, d’) = u(c)y2(T, d) and (7.35) implies 


Il < Lu(c)y2(T, d) . (7.43) 


Recalling the lower bound (7.13), it follows that we can choose c a universal 
constant large enough that II < (1/2)E|| )0; aigixil|. We fix such a value of c. 
Then (7.42) entails 


E| Y > aigixi | #21. 
i 


Consider independent Bernoulli r.v.s ¢; that are independent of the r.v.s g;, so that 
by symmetry 


[=E| Y aieigiltigil<c)Xi | . 
i 


The contraction principle (Lemma 6.4.4) used given the randomness of the variables 
gi yields I < cE|| }°; ajé; x; ||, which completes the proof. oO 


Exercise 7.4.3 Show that (7.41) does not hold when x; are general maps from T to 
C with |x;(t)| = 1. 


Exercise 7.4.4 In this exercise we deduce the Marcus-Pisier theorem from Theo- 
rem 6.2.8 (the Latata-Bednorz theorem). This is not an economical way to proceed, 


20 This is not a coincidence. I studied the Marcus-Pisier theorem before I invented Theorem 6.4.1. 
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since the proof of the Latata-Bednorz theorem is very much harder than the proof of 
the Marcus-Pisier theorem. Nonetheless the argument is very instructive as to what 
role translation invariance plays. We set S = El] 7; aie: xill. 


(a) Use Theorem 6.2.8 to show that we have a decomposition a; x;(t) = u;(t) + 
vj (t) where 5°; |u;(t)| < LS and Esup,e7 | 0; giui(t)| < LS. 

(b) Apply the previous decomposition to ¢ + s instead of ¢ and average 
over s to prove that we have a4; = uj + v; where ; |u| < LS and 
Esup,er | 0; givixi(t)| < LS. Conclude. 


Combining (7.41) with (7.13), recalling the distance d of (7.14), and using (7.33) 
for the upper bound, we obtain the following fundamental result: 


Corollary 7.4.5 We have 


1 
77d <E| do aeixi | < Ly2(T, d). (7.44) 


I 


The next technical lemma makes the left-hand side of (7.44) more precise. Its 
proof reveals why we controlled the squares in (7.10). The relevance of this lemma 
will become clear only later. 


Lemma 7.4.6 We have 


1 1 
P( | Laeixi| = ped. d)) ==. (7.45) 
I 
The rv. X = || 0; eiaixi | satisfies EX > y2(T,d»)/L by (7.44). Combin- 


ing (7.6), (7.10), and (7.32), it also satisfies Ex? < Ly(T,d)*. The conclusion 
follows from the Paley-Zygmund inequality (6.15). 


7.4.2 Applications of the Marcus-Pisier Theorem 


It is now easy to complete the goal of providing upper and lower bounds of 
| 0; & xXil| which are of the same order when the L! and L? norms of the &; are 
of the same order. 


Proposition 7.4.7 Consider independent symmetric real valued random variables 
&; and characters x;. Consider on T the two distances given by 


dy(s,t)” =D (EMI) 1xi(s) — x1)? 
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and 


dy(s,t)? = ) (EE) Ixus) — xO)? 
Then, assuming (7.31), we have 
1 
TMT, di) < E|| Do six | < Ly2(T, a2). (7.46) 


Proof The right-hand side of (7.46) simply reproduces (7.33). The left-hand side 


follows by combining (6.7) and (7.44) for a; = E|é;|. oO 
Let us set 
(Ee)? 
A = sup — (7.47) 
i -El&i| 


Then y2(T, dz) < Ay2(T, d,) and we have obtained upper and lower estimates of 
the quantity E]| }°; a;& x;|| whose ratio is < LA. In the case where the r.v.s & are 
not square-integrable, we will obviously need other methods. We shall return to this 
topic later, where we shall be able to estimate the quantity E|| }°; a;&; x;|| under 
the only assumption that the r.v.s &; are independent and symmetric. We shall also 
investigate the convergence of random Fourier series. We simply mention here that 
for such a series where the quantity A of (7.47) is finite, Proposition 7.4.7 allows us 
to show that the necessary and sufficient condition for convergence is y2(T, d2) < 
oo. For further use, let us draw a simple consequence of Proposition 7.4.7, whose 
proof should now be obvious. 


Corollary 7.4.8 With the notations of Proposition 7.4.7 consider also independent 
symmetric rv.s (6;). Then we have 


E| > 0 &xi| < L sup Si anne ED oa (7.48) 
i l 

It is not always easy to estimate the quantity 2(7, d) in concrete situations. The 
book of Marcus and Pisier [61] contains a thorough account (which we will not 
reproduce) of the link between the present results and the “classical ones’. To 
illustrate the problems that arise, consider, for example, the case where T = 
{-1, 1} and fori < N andt = (f;)j<y € T, let x;(t) = #;. Since |t;| = 1, for 
real numbers, a; it holds that || 0; —, aiéiti|| = )0;—y |ai|. Combining with (7.23) 
and (7.44), we get 7 7 


1 
7 Wail n(7,d) <b) lail, (7.49) 


i<N i<N 
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where d(s,t)? = Djey a7lxils) — KOMP = 4 Vien 471 ;.45;- The following 
exercise is in fact quite challenging: 


Exercise 7.4.9 Find a direct proof of (7.49). 


Let us end this section by a comparison theorem, which is a rather direct 
consequence of Proposition 7.4.7. 


Proposition 7.4.10 Consider independent symmetric r.v.s &, 0;, and characters xj. 
Let us assume (7.31) and that the following holds for a certain constant C 


Vi, Vu>C, P(|6| > uw) > P(l&| = Cu), (7.50) 
E|a;|>1/C. (7.51) 


Then for numbers (a;) we have 
E| >> aiéixi|| < KE| > - a:6:x:| (7.52) 
i i 


where K depends on C only. 


Proof A main ingredient of the proof is that (7.50) implies that there is a joint 
realization”! of the pairs (|&;|, |0;|) such that || < K (|6;|+ 1). As I do not want to 
struggle on irrelevant technicalities, I will prove this only when the distributions of 
|€;| and |6;| have no atom. Then one simply takes |&| = ;(|0;|) where f(t) is the 
smallest number such P(|&| > f;(t)) = P(\@;| = 1). Thus it follows from (7.50) 
that f;(t) < Ct fort > C, and since f; is increasing, we have f;(t) < C? fort < C 
so that fj(t) < K+ 1). 

A second main ingredient is that if |b;| < |c;| then 


E| So die: xi | < KE| So cikixi | ‘ (7.53) 
i i 


This follows from (7.48). Let us denote by E, expectation in the Bernoulli r.v.s (¢;) 
only. We then write, using (7.53) in the first inequality and the triangle inequality in 
the second one 


Eel) >> aieiléilxi] = Eel] So aie: fii) xl] < KEel] )0 aiei (Oil + Dxi| 


< KE.|| )aeil@|xi| + KEe| Doaieixi]. 7.54) 
i i 


21 Also called a coupling. 
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Now, we assume in (7.51) that E|6;| => 1/K and it then follows from (7.48) that 
E| ; GEIXi | < KE| >; HX: |. Finally since (&;) are independent symmetric, 
the sequences (&;) and (¢;|&;|) have the same distribution (and similarly for 6;) so 
that taking expectation in (7.54) finishes the proof. oO 


7.5 Statement of Main Results 


7.5.1 General Setting 


In the rest of this chapter, we complete the program outlined in Sect. 7.2 of finding 
upper and lower bounds of the same order for the quantities || )7; &x;|| where x; 
are characters and &; are independent symmetric r.v.s. (Let us stress that no moment 
conditions whatsoever are now required on the variables &;.) As a consequence 
we obtain necessary and sufficient conditions for the convergence of random 
Fourier series in a very general setting (and in particular the series (7.2)). These 
characterizations are in essence of the same nature as the results of Marcus and 
Pisier. Unfortunately this means that it is not always immediate to apply them in 
concrete situations, but we will illustrate at length how this can be done. Fulfilling 
this program requires a key conceptual advance compared to the work of Sect. 7.3.2: 
the idea of ‘“‘families of distances”, which is one of the central themes of this work. 

We will consider random sums of functions on T which are more general than the 
sums )/, & x; (where the &; are independent symmetric r.v.s) which we have been 
considering up to this point. This extra generality offers no difficulty whatsoever 
while covering other interesting situations, such as the case of “harmonizable 
infinitely divisible processes” to be considered in Chap. 12. 

We describe our setting now. We assume as in Sect.7.2 that T is a compact 
metrizable Abelian group with Haar measure j.2” We denote by G is dual group, 
ie., the set of continuous characters on T, and by CG the set of functions of the 
type ax where a € Cand xy € G.* 

Consider independent r.v.s Z; valued in CG, so that Z; is a random function on 
T. The crucial property is 


Vs,teT, |Z;(s) — Z;(t)| = |Zi(s — t) — Z;(0)| , (7.55) 


which holds since it holds for characters by (7.8). 
Our purpose is to study random trigonometric sums of the type }°; ¢;Z; where ¢; 
are independent Bernoulli r.v.s, independent of the Z;. This amounts to considering 


2 It requires only simple changes to treat the case where T is only locally compact and not 
necessarily metrizable. 


23 Please note that CG is not a vector space! 
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sums of the type }); Z; where the r.v.s Z; are independent symmetric.** We set 


X= Ss ej; Zi(t) . (7.56) 


In particular we aim to study the r.v. 


sup |X;| = || Vas 
teT F 


, (7.57) 


where || - || denotes the supremum norm in the space of continuous functions on T 
and specifically to find “upper and lower bounds” on this r.v. 

Recalling that 1 denotes the character everywhere equal to 1, in order to avoid 
trivial situations, we always assume the following: 


Vi, Z ¢Clas. (7.58) 


This exactly corresponds to our condition (7.31) of assuming x; # 1 in the 
preceding section. 

To give a concrete example, let us consider characters x; with x; 4 1 and real- 
valued symmetric r.v.s &;, only finitely many of which are not 0. The r.v.s Z; = &; x; 
are valued in CG, independent symmetric (since it is the case for the r.v.s &;), so that 
the quantity (7.57) reduces to 


| do eee xi - (7.59) 


We provided a partial answer to the question of bounding from above the 
quantity (7.59) in Sect. 7.3.2 under the condition that Eé : < oo for each i. However 
the r.v.s €& might have “fat tails”, and it will turn out that the size of these tails 
governs the size of the quantity (7.59). The results we will prove allow to control 
the quantity (7.59) without assuming that E&? <O. 

The leading idea of our approach is to work conditionally on the rv.s Z;. We will 
detail later how this is done (following the procedure described on page 214), but 
the point is simple: when we fix the points Z; € CG, then Z; = a;x; where a; is 
a given (=nonrandom) number and x; is a given (=nonrandom) character. This is 
the essential property of the Z;. In this manner we basically reduce the study of the 
general sums > €;Z; to the study of the much simpler sums dy; €ja; X; (but this 
reduction is by no means routine). 


24 We refer the reader to, e.g., Proposition 8.1.5 of [33] for a detailed study of a “symmetrization 
procedure” in the setting of random Fourier series, showing that the symmetric case is the important 
one. 
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This is how we will obtain lower bounds, using (7.44). Upper bounds are more 
delicate. We originally discovered our upper bounds using chaining, but we will 
present a simpler argument (which is somewhat specific to random Fourier series). 

We end this section by commenting a bit on what the random function Z; valued 
in CG looks like. By definition of CG, we have Z; = & x; where &; is a complex- 
valued r.v. and x; is a random character, but we do not assume that &; and x; are 
independent r.v.s. Since for a character x we have x (0) = 1, we simply have &; = 
Z;(0). Let us describe the situation more precisely.*> Since we assume that T is 
metrizable, its dual group G is countable (as follows from Lemma 7.3.6 and the 
fact that L*(T, j2) is separable). Recalling that 1 denotes the character equal to 1 
everywhere~° we can enumerate G \ {1} as a sequence ( Xe)e>1, and (7.58) implies 
that a.s. we have Z; € UeCxe. Unless Z; = 0, there is a unique € > 1 such 
that Z; € Cxe. Therefore Z; = )°,., Zj1jz,ecy,). When Z; € Cyxe we have 
Zi = Zi(0)xe, so that we have the expression 


Zi= >i kiexe (7.60) 


>1 


where 


Eine = Zi (O)Uz;ecyy} - 


The important point in (7.60) is that the r.v.s (&¢)e>1 “have disjoint support”: if 
€ # f' then & &,¢ = 0 a.s, so for any realization of the randomness, in the sum 
Zi = ei &i.¢Xe, there is at most one non-zero term. 


7.5.2 Families of Distances 


We expect that a control from above of the quantity (7.57) implies a kind of 
smallness of T. But smallness with respect to what? An obvious idea would be 
to consider the distance defined by 


d(s,t)? = + E|Z;(s) — Z;(t)|? . (7.61) 


Unfortunately such a formula gives too much weight to the large values of Z;. To 
remove the influences of these large values, we need to truncate. At which level 
should we truncate? As you might guess, there is no free lunch, and we need to 
consider truncations at all possible levels. That is, for s,t € T andu > 0, we 


>5 This description will not be used before Sect. 7.9. 
26 Which happens to be the unit of G. 
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consider the quantities 


g(s,t,u) =) E(\u(Zi(s) — Zit)? A 1), (7.62) 


where x A 1 = min(x, 1).27 Given a number r > 2, for j € Z, we define 


gj(s,t) = 9(s,t,r/) =) E(r! (Zils) — Zi)? A 1). (7.63) 


Thus, gj is the square of a translation-invariant distance on T. The “family of 
distances”?® (yj) 1S appropriate to estimate the quantity (7.57). For the purposes 
of this section, it suffices to consider the case r = 2. Other values of r are useful 
for related purposes, so for consistency we allow the case r > 2 (which changes 
nothing to the proofs). We observe that gj41 = 9j;. 

The concept of family of distances may be disturbing at first, but once one gets 
used to it, it is not any harder than working with one single distance. There is a 
foolproof rule: 


To make sense of a statement involving a family of distances, 


pretend that g;(s, t) = r-Jd(s,t)* fora given distance d . (7.64) 


To motivate this rule, observe that if we should disregard the truncation in (7.62) 
by (7.61), we would have indeed yj (s, t) = r7/d(s, 1)”. 

It is clear at least since we stated Theorem 2.7.14 that families of distances 
will be relevant in bounding stochastic processes. One may ask then if there is 
a simple way to understand why the family of distances (7.63) is relevant here. 
This will become apparent while going through the mechanism of the proofs, 
but we can already stress some features. The right-hand side of (7.63) is the 
expected value of a sum of an independent family of positive r.v.s, all bounded 
by 1. Elementary considerations show that such sums are strongly concentrated 
around their expectations (as expressed in Lemma 7.7.2). Taking advantage of that 
fact alone, we have a control (either from above or from below) of the size of 
T as measured in an appropriate way by the family of distances (7.63); we will 
show that for the typical choice of the randomness of the Z;, we also have a 
similar control of the size of T for the family of random distances yj; given by 
wis,t)= >; \rJ (Z; (s) — Z; (t))|? A 1. In this manner we really reduce the entire 
problem of studying random trigonometric sums to the case of the sums 7; €;4; Xi. 


27 As you notice, writing (7.62) is not the most immediate way to truncate, but you will understand 
soon the advantages of using this formulation. 

28 This is a very dumb name since they are not distances but squares of distances and since it is 
more a sequence than a family. I am not going to change it. 
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It is also apparent how to produce lower bounds. Recalling that when we work 
given the Z; these are of the type a; x;, this is based on the obvious fact 


wi (st) = Dori (Zi) — Zi@)P AI = Do WriaiGas)— OP < 1740s, 1), 


where d(s, t)* = Y; lai (xi(s) — xi (t) |" is the distance (7.14). We can then expect 
that lower bounds on the family y;(s, t) will produce lower bounds on the distance 
d and in turn through (7.44) lower bounds on the random trigonometric sum 


Yo; 814i Xi- 

It is far less apparent why the distances w;(s,1) also suffice to obtain upper 
bounds, and we have no magic explanation to offer here. We know two very different 
proofs of this fact, and neither of them is very intuitive. Let us only observe that in 
writing (7.62) we disregard some information about the large values of Z;, but we 
will control these large values simply because there are very few of them (a finite 
number in the setting of infinite series). 


7.5.3 Lower Bounds 


Our first result is a lower bound for the sum || >°; €;Z;|| (although it will not be 
immediately obvious that this is a lower bound). 


Theorem 7.5.1 There exists a universal constant ao with the following property. 
Assume that for some M > 0 we have 


P(| do &2Zil| = M) < a0. (7.65) 


Then for n > 0 we can find numbers jn € Z such that”? 
Vs,teT, gj (s,t) <1 (7.66) 
and that for eachn > 1 


u({s; g;,(s,0) <2"})) > 2°" =N7!. (7.67) 


2 The number 1 in the right-hand side of (7.66) does not play any special role and can be replaced 
by any other constant > 0. 
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for which 


> Wyn < KM. (7.68) 


n>0 


For a first understanding of this result, recall that gj; < j41 so that the 
conditions (7.66) and (7.67) are easier to satisfy for small values of the j,. On the 
other hand, (7.68) states that we can satisfy these conditions by values of the j, 
which are not too small. 

Conditions (7.66) and (7.67) are very important, and it is extremely useful to 
consider the largest integers which satisfy them. Throughout this chapter, when 


dealing with trigonometric sums, we will use the notation 
jo=sup{j eZ; Vs,teT; j(s,t) <1} € ZU {oo}. (7.69) 


This definition makes sense because the set on the right is never empty. Indeed, 
since |Z;(t)| = |Z;(0)|, 


gi(s,t) SE) 2° ZO? AI, 


L 


and since the sum is finite, it follows from dominated convergence that the limit 
of the right-hand side as 7 —> —oo is zero, and thus there exists 7 for which 
SUP; teT Pj (s,t) < 1. Similarly forn > 1, we define 


jn =sup{jeZ; ws; 9j(s,0) <2") =2-7 =N,'}e ZUfoo}. (7.70) 


One may like to think of r—/0 as a substitute for the diameter of T and as 2/27 
as a substitute for the entropy number e, (as will become gradually apparent). We 
can now make clear why Theorem 7.5.1 is a lower bound. 


Corollary 7.5.2 There exists a universal constant a > 0 such that 


P(| y > eiZil| > =r) >a, (7.71) 
i 


n>0 


where K depends on r only. 


Proof Set M = (2Ko)~! et 2"--Jn where Ko the constant of (7.68). Assume 
for contradiction that (7.65) holds, and consider the numbers j, provided by 
Theorem 7.5.1 so that jy < j, and 


KoM = sar < yor . 


n>0 n>0 


This contradicts (7.68), proving that (7.65) does not hold, so that (7.71) holds. O 
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Let us note the following consequence of (7.71) and (2.7): 


ria <E|| > &Zi| - (7.72) 


n>0 i 


To understand the nature of the sum aa 2"pIn, let us apply the strat- 
egy (7.64), assuming that yj; (s,t) = r?Jd(s, t)*. Then 


jn = sup{j eZ; w({s; d(s,0) <2"?r})>N7'}, 


and thus if ¢, is defined as in (7.3), we see that Qn/2p- jn ~ €y, So that Wyn in ~ 
€,2"/2, 


The quantity y Q"p—In appears as the natural substitute 


n>0 


or the familiar entropy integral A ie : (7.73) 
P. 8 


n>0 


In (7.69), in the condition g;(s,t) < 1 the number | can be replaced by any other 
provided of course that one changes the constant in (7.71). On page 232 the reader 
will find a computation of the quantity }°,,.9 2"r—Jn in some simple cases. 

It is important to understand the next example and its consequences. 


Exercise 7.5.3 Assume that ; P(|Z;| 4 0) < 1. Prove that iF = oo for eachn 
and that (7.71) brings no information. 


It would be a devastating misunderstanding to conclude from this example 
that (7.71) is a “weak result”. The real meaning is more subtle: (7.71) does not 
bring much information on certain sums, but these are of a very special type. The 
decomposition theorem (Theorem 7.5.14) states that a general random trigonometric 
sum is the sum of two pieces. For one of these (7.71) captures exact information, 
and the other piece is of a very special type. This will be a feature of our main 
results in Chap. 11. 

One of the main results of Chap. 11 will be a considerable generalization of (7.72) 
to a setting where there is no translation invariance. Then we will not be able to use 
covering numbers (as is being done in a sense in (7.72)). The next exercise will help 
you understand the formulation we will adopt there. 


Exercise 7.5.4 


(a) Prove that for s,t, u € T and any j, we have 


pj(s,t) < 2gj(s,u) + gj, 1) . 
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(b) Consider a subset D of T as assume that for any s € D we have g;(s,0) < d 
for a certain number d. Prove that if s and ¢ belong to the same translate of 
D — D, we have 9j(s,t) < 4d. 

(c) Prove that under the conditions (7.66)—(7.68), we can find an admissible 
sequence (A,,) of partitions of T with the following property: If n > 1, 


steAe A> on,6,052"". (7.74) 


Hint: Use Exercise 2.7.6 and Lemma 7.1.3. 


7.5.4 Upper Bounds 


Let us turn to upper bounds. In order to avoid technical statements at this stage, 
we will assume that there is enough integrability that the size of ||}; ¢;Z;|| can be 
measured by its expectation El| }°; ¢;Z;|| °° Corollary 7.5.2 states that the typical 


value of ||}; €;Z;|| controls from above the “entropy integral” )° 4 2"r—dn Tn 
the reverse direction, since the quantities j, say nothing about the large values of 
Zi, we cannot expect that the “entropy integral” )7,,..) 2” r—/* will control the tails 


of the rv. || }°; €;Z;||. However, as the following expresses, we control the size of 


|| 0; €/Zi|| as soon as we control the “entropy integral” }° 5 2"r-Jn and the size 
of the single rv. }°; ¢;Z;(0). 


Theorem 7.5.5 For n > O consider numbers jy, € Z, which satisfy (7.66) 
and (7.67). Then, for any p = 1, we have 


(El 2 yy” < 09 Qn po dn (E| ye aiZi0)!")”) « GIS) 
i n=0 i 


where K depends only onr and p. 


Of course in (7.75), the larger jy, the better bound we get. The best bound is obtained 
for jn = jn- 


7.5.5 Highlighting the Magic of Theorem 7.5.5 


The key to Theorem 7.5.5 is the case where Z; = aj;xj; is a constant multiple 
of a nonrandom character. We investigated this situation in detail in Sect.7.4, but 
Theorem 7.5.5 provides crucial new information, and we explain this now. In that 


30 More general statements which assume no integrability will be given in Sect. 7.8. 
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case, the function g; of (7.63) are given by 


gist) =o Irlai(xils) — OPAL. (7.76) 


Thus, assuming the conditions (7.66) and (7.67) and using (7.75) for p = 1 yields 
the bound 


E| asin <K > 2" + «(Sola a , (7.77) 


n>0 


Consider the distance d of (7.22), given by d(s,t)* = )>)5 lail?lxi(s) — xO. 
and define 


=inf{e > 0; w(Ba0,6)) > Nz!) . (7.78) 


It then follows from (7.44) and (7.4) that 


— ee <E| 2K Js K > 02". (7.79) 


Ks n>0 


We have reached here a very unusual situation: it is by no means obvious to compare 
the upper bounds in (7.77) and (7.79). In particular, combining these inequalities we 
reach the following: 


So 2! ey < K Soar +K(SlaiP)" “ (7.80) 


n>0 n>0 


where K depends on r only, but we do not know how to give a direct proof of this 
inequality. The quantities appearing in this inequality involve only characters and 
complex numbers. There are no random variables involved, so why should one use 
random trigonometric sums to prove it? 


Research Problem 7.5.6 Find a proof of (7.80) which does not use random 
trigonometric sums. 


While the inequality (7.80) is mysterious, the reverse inequality is very clear. 
First, we prover in Sect.7.3.3 (assuming x; # 1 for each i) that }°; la\|? < 
LA(T,d)? = Le. Next, keeping (7.32) in mind, we prove that 


yr eK y 26, (7.81) 


n>0 n>0 
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When In is finite, we have 
wis ET ; 97 44(s,0) < 2") < Nes (7.82) 
Since obviously g;(s,t) < r-/d(s, t)? we have 
Ba(O, 2"? r-h-!) C fs ET: G7 4168,0) <2"). 


Combining with (7.82) and (7.78), this proves that 2”/27—/n—! < €, and (7.81). 

In summary of this discussion, we could argue that even though in retro- 
spect (7.77) does not improve on (7.79), it is a better inequality, because it is quite 
obvious that its right-hand side is of smaller order than the right-hand side of (7.79), 
whereas the opposite is absolutely not obvious. 


Exercise 7.5.7 Assume that g;(s,t) < 1 foreach s,t € T and that x; 4 1 for all 
i. Prove that 7°; a? < r~74 /2. 


7.5.6 Combining Upper and Lower Bounds 


We can combine Corollary 7.5.2 and Theorem 7.5.5 to provide upper and lower 
bounds for (E|| }°; ¢Z; \!”)!/P that are of the same order. Let us state the result in 
the case of (7.59). From now on, K denotes a number that depends only on r and p 
and that need not be the same on each occurrence. 


Theorem 7.5.8 Assume that the rv.s & are independent symmetric. If the numbers 
Jn are as in (7.69) and (7.70), then, for each p > 1, 


= (oar +4 (El Dal’)”) Z (E| S&ixi ei 
n>0 i i 
< K( 2 = (E| ail’)””) (7.83) 
n>0 i 


Not the least remarkable feature of this result is that it assumes nothing (beyond 
independence and symmetry) on the r.v.s &;. 


7.5.7 An Example: Tails in u~? 


Explicit examples of application of these abstract theorems will be given in 
Sect. 7.12, but right away we illustrate Theorem 7.5.8 in some cases (investigated 
first by M. Marcus and G. Pisier [62]): having upper and lower bounds which hold 
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in great generality unfortunately does not mean that it is always obvious to relate 
them to other known bounds. We consider complex numbers a; and the distance d, 
on T defined by 


dp(s,t)? = Y- lai(xi(s) — xi(O))1? - (7.84) 


Proposition 7.5.9 Consider symmetric r.v.s 6; which satisfy for certain numbers 
1 < p <2andC > Oand for allu > 0 


P(@|=u= Cu? . (7.85) 


Assume that &; = aj0;. Then we have 


Slate < Kyg(T, dp) . (7.86) 


n>0 


Here we use the notation of Corollary 7.5.2, K denotes a constant depending only 
on C,r and p, and 1/p+1/q = 1. The point of (7.86) is that it relates the quantity 
yo n>0 2"r—Jn with the more usual quantity ¥q(T, dp). The proof depends on two 
simple lemmas. 


Lemma 7.5.10 Under (7.85) for any j € Z we have 
gj(s,t) < Kr/Pd,(s,t)? . (7.87) 


Proof Using (7.85) in the second line we obtain that for v 4 0, 


1/2 


1 1 
E (vai? at) = f P(jwoit? = nar = f P(e > — ar 
0 0 |v| 


< ewes = Klv|? 7.88 
ca 0 tp/2 t= |v| * (7. ) 


Since § = a;6; this implies E(|r/& (xi(s)— xi) PAD) < Kr/? lai (xi(s)—xi(t))|” 
and summation over i yields the result. oO 


Lemma 7.5.11 Consider for n > 0 the numbers €, as in Theorem 7.1.1, for the 
distance dp, i.€., €, = inf{e > 0; w({s; dp(s, 0) < €})} = No Then 


2"/Py—n < Key. (7.89) 


Proof We may assume that j, < 00. Since {s; d,(s,0) < 2"/Py—in 1K} C 
{s; 27,410 0) < 2} and since p({s; 97,4105 0) < 2"}) < no. we have 


w({s; dp(s, 0) < 2"/Pr—4n/K}) < Nz! so that 2"/Pr—in/K < €. Oo 
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Proof of Proposition 7.5.9 Consider for n > 0 the numbers €, as above, so that 

> 6n2"4 < Kyg(T, dp) 

n>0 
by (7.6). The result follows by (7.89). oO 
Exercise 7.5.12 Use (7.83) (taking p = 1 there) to prove that when x; 4 1 for 


each i then 


El >° ai6ixil| < Kyq(T. dp) . (7.90) 
i 
Hint: You have to prove that E| 17; a;6;| < K (3; |ai|?)'/? and (); |a;|?)'/? < 
K A(T, dy). The first inequality is elementary but rather tricky, and the second uses 
the methods of Sect. 7.3.3. 
The following is a kind of converse to (7.90): 


Proposition 7.5.13 Consider 1 < p < 2 and its conjugate number q. Consider 
independent symmetric rv.s (0;), and assume that for some constant C and alli, we 
have 


u>= C= P(\6;| =u) = ai (7.91) 
Assume also that x; # 1 for each i. Then 
¥q(T, dp) < KE|| ¥ > ai6ixil| . (7.92) 
i 


where K depends only on C. 


Magic proof of Proposition*! 7.5.13 This proof uses the concept of p-stable rv. 
which was described in Sect. 5.1. Consider an i.i.d. sequence (&;) of p-stable r.v.s &; 
with E|é;| = 1. It is then known that P(|é;| > uv) < Ku~? for u > 0 so that (7.50) 
holds, and therefore (7.52) holds. Now, Theorem 5.2.1 asserts that yg(T,dp) < 
KE|| >°; & xi|], and combining with (7.52) this finishes the proof. oO 


In Sect. 7.11 we give an arguably more natural proof which does not use p-stable 
L.v.S. 


7.5.8 The Decomposition Theorem 


In this section we go back to the theme started in Sect.6.8. We will prove 
later, in Theorem 11.10.3, that under rather general circumstances, a random 
series can be decomposed into two pieces, one of which can be controlled by 


3!Please see the comments about this proof in Sect. 7.14. 
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chaining and one which can be controlled by ignoring cancellations between the 
terms of the series. This theorem applies in particular in the case of the random 
trigonometric sums we are considering here.** Unfortunately, the two terms of the 
decomposition of a random trigonometric sum constructed in Theorem 1 1.10.3 are 
not themselves random trigonometric sums. The following asserts that we can obtain 
a similar decomposition where the two terms of the decomposition are themselves 
random trigonometric sums. To understand this result, the reader should review 
Theorem 6.2.8, which is of a strikingly similar nature (but much more difficult). 


Theorem 7.5.14 Consider independent r.v.s Z; valued in CG, and assume (7.58). 
Set S = El] }°; e:Z;||. Then there is a decomposition Z; = Z; + Z, where both 

: ae 3 
Z: and Z" are in CG, where each of the sequences (Z') and (Z/’) is independent, 
and satisfy 


Ey [ZO)/= LS (7.93) 


and 
y2(T,d) < LS, (7.94) 


where the distance d is given by d(s, t)* = EZ’) - ZO. Furthermore in 
the case of usual random trigonometric sums, when Z; = & xi, the decomposition 
takes the form Z; = &/ x; and Z! = &"'x;. 


In the case of usual random series, this decomposition witnesses in a transparent way 
the size of S = El )°; & x:||. Indeed, (7.93) makes it obvious that E|| }7; &/ xi || < 
LS, whereas (7.33) and (7.94) imply that Ell }0; &/’xil| < Ly2(T,d) < LS. The 
argument works just the same in general, as the following shows: 


Exercise 7.5.15 Generalize (7.23) to the case of sums dy €;Z;. That is, prove that 
9\ 1/2 
E| )eZil| < L(t. d) + (D7 1ZzP) 
i i 


where 


d(s,t)? =) |Z) - Zi. 


3? The main ingredient of the proof in the case of random trigonometric series is Theorem 7.5.1 
which is far easier than the corresponding result Theorem 11.7.1 in the general case. 


33 But the sequences are not independent of each other. 
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7.5.9 Convergence 


We state now our convergence theorems. We consider independent r.v.s (Z;)i>1 with 
Z; € CG and independent Bernoulli r.v.s ¢; independent of the randomness of the 
Zi. Throughout this section we say that the series }°;. , ¢;Z; converges a.s. if it 
converges a.s. in the Banach space of continuous functions on T provided with the 
uniform norm.** We also recall that we assume (7.58). 


Theorem 7.5.16 The series Past €;Z; converges a.s. if and only if the following 
occurs: There exists jo such that 
Vs,teT, YI E(r(Zi(s)— Zi) Al <1, (7.95) 
i=l 
and for n > | there exists jn € Z for which 
: Jn . ? 2 n 
w(fse7s DEH (Zils) - ZO PAD s2"})=—, (7.96) 
N, 
i>1 
such that 
So aS (7.97) 
n>0 
Moreover, when these conditions are satisfied, for each p > 1 we have 
E| x ae || <wos E| Y > Zi (0)|? <0O. 
i>1 iz] 


We have also the following, less concrete but more spectacular: 


Theorem 7.5.17 The series et €;Z; converges almost surely if and only if one 
may find a decomposition Z; = Zt + Zz + Vi with the following properties. First, 
each of the sequences (Z!) for £ = 1, 2,3 is independent and valued in CG. Next 


Se eee, (7.98) 
i=1 
J > EIZ?(0)| < 00, (7.99) 
i=1 
y2(T,d)< oo, (7.100) 


34 This is the most natural notion of convergence. A classical theorem of Billard (see [40, Theorem 
3, p. 58]) relates different notions of convergence of a random Fourier series. 
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where the distance d is given by d(s, t= ae E|Z}(s) _ Zz (t)|*. Furthermore, 
when Z; = &; x; there exists a decomposition & = él + &? + & such that for € < 3 
we have Zz = Ef y:. 


The necessary conditions and the sufficient conditions stated in the next theorem 
are due to Marcus and Pisier [62] and were known much before the more general 
Theorem 7.5.16. We will show how to deduce them from that result.*> 


Theorem 7.5.18 


(a) Fori > | consider characters x; and numbers a;. Then the series si iE: Xi 
converges almost surely if and only if y2(T, dz) < © where d2 the distance 
given by 


dy(s,t)” =) ail? Ixi(s) — xO? - (7.101) 


i>1 


(b) Consider independent symmetric r.v.s (6;)j>1 and numbers (a;)j>1. Consider a 
number | < p < 2 and the distance dp on T given by 


dy(s,t)? =) > lal? |xi(s) — x@)/? . (7.102) 


i=l 


Assuming that the r.v.s (6;)i>1 Satisfy (7.55) then if yq(T, dp) < 00 (where q is 
the conjugate exponent of p), the series )°,. , ai9; Xi converges a.s. 

(c) With the same notation as in (b) if the rv.s (6;)i>1 satisfy (7.92) and if the series 
Yim aj9i Xi converges a.s., then Yg(T, dp) < ov. 


7.6 A Primer on Random Sets 


The purpose of the present section is to bring forward several very elementary facts 
which will be used on several occasions in the rest of the present chapter and 
Chap. 11. As these facts will be part of non-trivial proofs, it may help the reader 
to meet them first in their simplest setting. We consider a probability space (7, j) 
and a random subset 7, of T where we symbolize the randomness by a point w 
in a certain probability space (2. We are mostly interested in the case where these 
random subsets are typically very small. (Going to complements this also covers 
the case where the complements of these sets are very small.) We assume enough 
measurability, and for s € T we define 


p(s) = P(s € Ty) . 


35 Unfortunately, this is not entirely obvious. 
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Lemma 7.6.1 We have 
Eu(T.) = | pordue) . 


Proof This is simply Fubini’s theorem. Consider the set © = {(s,@); 5 € Ty} C 
T x 2. Then 


1 ® P(@) = i, P(s € T,)dy(s) = [ Ou 
and also 
1. ®P(@) = / y(T,)dP(@) = Eu(To) . ni 
Q 


The following result quantifies that Ey(7,,) is small if p(s) is typically small: 
Lemma 7.6.2 Consider a subset A of T and assume that for a certain € > 0 we 
haves ¢ A= p(s) < €. Then 

Eu(Tw) < w(A) +e. (7.103) 


Proof Since p(s) < 1 for any s € T and p(s) < € fors € A‘, we have 
[ pons = [ rower + | rirancs) <u(A)+e. Oo 


We will use this result when € is overwhelmingly small, say € = 1/N,. In that case 
we will be able to show that jz(7,) is small with overwhelming probability simply 
by using Markov’s inequality as in the following: 


Lemma 7.6.3 Assume that for some € > 0, there is a subset A of T with w(A) < € 
and p(s) < € fors ¢ A. Then 


P(u(Tw) = 2Ve) < Ve. 


Proof Indeed Eu(T,,) < 2€ by Lemma 7.6.2, and the conclusion by Markovis 
inequality. 

Generally speaking, the use of Fubini’s theorem in the present situation is often 
precious, as in the following result: 


Lemma 7.6.4 Let c = Eu(T,,). Then for b < c we have 


P(1u(T») = b) > — . 
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Proof Denoting by {20 the event 4(T,) => b we write (using also that w(T,) < 1) 
c = Eu(Ty) = El aye (To) + El oc u(To) < P(80) +b — P(20)). 0 


Exercise 7.6.5 Consider numbers 0 < b,c < 1, aset T and a probability measure 
won T. For each t € T consider an event 3; with P(Z;) > c. Then the event 2 
defined by u({t; w € &;}) = b satisfies P02) > (c — b)/(1 — BD). 


7.7 Proofs, Lower Bounds 


As we already explained, the main idea is to work given the r.v.s Z;. The way 
to explain the proof strongly depends on whether one assumes that the reader 
has internalized the basic mechanisms of dealing with two independent sources of 
randomness or whether one does not make this assumption. Trying to address both 
classes of readers, we will give full technical details in the first result where the two 
sources of randomness really occur, Lemma 7.7.5.°° 

Let us define the distance d,,(s, t) on T by 


ahGiy => 1Z@) =Z20)P « (7.104) 


This distance depends on the random quantities Z;, so it is a random distance. Here 
and in the rest of the chapter, the letter @ symbolizes the randomness of the Z;, so 
an implicit feature of the notation d,,(s, t) is that this distance depends only on the 
randomness of the Z; but not on the randomness of the ¢;. One should form the 
following mental picture: working given the r.v.s Z; means working at a fixed value 
of w. 

Our goal now is to control y2(T, d,) from below.?” The plan is to control d, 
from below and to show that consequently the balls with respect to d,) are small. 
The basic estimate is as follows: 


Lemma 7.7.1 Assume that for a certain j € Z, the points € T satisfies 


gj+i(s,0) > 2". (7.105) 


36 The mathematics involved are no more complicated than Fubini’s theorem. 


37 This quantity involves only the randomness of the Z;. 


240 7 Random Fourier Series and Trigonometric Sums 


Then 


P(do(s, 0) < 2”?-'r-F-1) =< PCS Iri#1Zi(s) - ZONAL <2") 


L 


<N7',. (7.106) 


The first inequality is obvious from the definition (7.104) of d, since when 
diy(s,0) < r-4—!2"/2-! we have )>; |[r/t!(Zi(s) — Z;(0))|? < 2"-?. The proof 
of the second inequality requires elementary probabilistic inequalities to which we 
turn now.°® These estimates will be used many times. 


Lemma 7.7.2. Consider independent rv.s (W;)i>1, with 0 < W; < 1. 
(a) If4A < Ss EW,, then 


03 W; < A) < exp(—A). 
(b) IfA= 4S EW;, then 
OF > A) <exp(- 4) . 
Proof 


(a) Since 1 — x < e-* <1—-x/2 for0 < x < 1, we have 


EW; EW; 
Eexp(—Wi) < 1— = < exp(- —") 


and thus 


1 
Eexp ( = > Wi) < exp ( =3 De EW) < exp(—2A). 
i=l i>1 
We conclude with the inequality P(Z < A) < exp A Eexp(—Z). 
(b) Observe that 1 + x < e* < 1+ 2x for0 < x < 1, so, as before, 
Eexp) Wi < exp2) EW, < see 
~ = 2 


i=l i>1 


and we use now that P(Z > A) < exp(—A)E exp Z. oO 


38 Much more general and sharper results exist in the same direction, but the simple form we 
provide suffices for our needs. 
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Proof of Lemma 7.7.1 Let W; = |r/*!(Zj(s) — Z;(0))|? A 1 and A = 2"7?. 
Then )),EW; = gj4i(s,0) => 2” = 4A. The result then follows from 
Lemma 7.7.2(a). oO 


We can now give the central argument. 


Lemma 7.7.3. With probability > 1/2 it occurs that 


yo 2% < Lry(T, do) . (7.107) 


n>5 
The idea is very simple. Assuming j, < 00, if a point s € T satisfies 
j,41(8,0) = 2”, (7.108) 


(7.106) shows that for most of the choices of the randomness of @ it holds that 
dy(s,0) > 2"/2-!-—in-1, The definition of = shows that all but very few of the 
points s satisfy (7.108). Thus for most of the choices of the randomness of the Z;, 
there will be only few points in T which satisfy d,(s,0) < 2"/2-1p—in—1 and 
this certainly contributes to make y2(T, d,,) large. Using this information for many 
values of n at the same time carries the day. The magic is that all the estimates fall 
very nicely into place. 

We now start the proof of Lemma 7.7.3. The sequence Gin)n>0 is obviously non- 
decreasing and j, = oo for n large enough, because since we consider a finite sum, 
say of N terms, for any value of j we have yj; (s,t) < N. There is nothing to prove 
if jn = 00 forn > 5, so we may consider the largest integer no = 5 such that 
jn < © forn < no. Then re ae in = = Vs<n<ng 2°" — in, 


Lemma 7.7.4 Forn < ng the event Sy defined by 


u({s eT; Soir (Z;(s) —ZiM)P ALS 2-7) < (7.109) 


Nn-3 


satisfies P(&,) > 1 — 2/Nn_-3. 


Proof We think of n as fixed and we follow the strategy of Lemma 7.6.3 for the set 

Ty ={s €T 3d; |ret*(Zji(s) — Z;(0))|? A 1 < 2-7}, where w symbolizes the 

randomness of the Z;. That is, we bound Ew(7,) and we use Markov’s inequality. 
The definition of qa implies that the set A = {s ; @ i 41G, 0) < 2”} satisfies 


u(A) < Nr! (7.110) 
Next, if we assume that s ¢ A then 


9; 41(8,0) = D> Edr#41(Z;(s) — Z;O)P A 1) > 2", 
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and by (7.106) it holds that p(s) := P(s € Ty) < 1/N,—2. Using (7.103) for 
€ = 1/N,~2, we then have 


1 1 2, 
Eu(T,,) < w(A) +e < —+ < ; 
. Nn Nn—-2 Nn-2 


(7.111) 


where we have also used (7.110). Thus, by Markov’s inequality, we have P(jz(T,)) = 
1/Nn—3) < 2Nn—3/Nn—2 and then 


P(u(Ta) < 1/Nn—3) = 1 — 2Nn—-3/Nn—-2 = 1—2/Nn-3 « oO 


Proof of Lemma 7.7.3 As a consequence of Lemma 7.7.4 the event 


i in 
oa fo} 


S<n<no 


(7.112) 


= 


satisfies P(2°) < Do,>5P(S2) < Yones NS < 1/2 so that P(8) > 1/2. 
Moreover, since 


Yor (Zils) — ZOYP ALS De Zils) — ZO)? = Pda ls, 1) 


(7.109) yields 


1 


n—3 


u({s ET; dg(s,0) < p—dn—1gn/2-1}) e 


It follows that when & occurs the number €, = €,(w) as in (7.3) satisfies €,_3(w) > 
r—dn—19"/2-1_ Consequently 


S> 2rd < Lr ¥* 2" €q(w) < Lryn(T, de) 5 


5<n<no n>0 


where we use (7.4) in the last equality. We have proved that (7.107) holds form € & 
and hence with probability > 1/2. Oo 


Lemma 7.7.5 There exists a constant a; > 0 such that if 
P(| EA M) 2H 
i 


then 


> 2" < LrM (7.113) 


n>5 
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This is the first result involving both the randomness of the Z; and the randomness 
of the ¢;. The proof consists in proving the existence of a constant ~ > 0 such that 


P(| ceZil = — rh) >a. (7.114) 
i 


n>5 


Proof for the Probabilist The inequality (7.114) follows by using (7.7.3) first and 
then Lemma 7.4.6 given the r.v.s Z;. oO 


Proof for the Novice in Probability This proof consists in detailing the mechanism 
at hand in the previous argument. We assume as on page 214 that the underlying 
probability space is a product 82 x ', with a product probability P = Pz ® Pe, 
and that if (w, w’) is the generic point of this product, then Z; = Z;, depends on w 
only and ¢; = €;,.. depends on a only. By Fubini’s theorem, for a set A C @ x 2’, 


P(A) = / P.({a’ € Q’; (w, a’) € A})dPz(@) . 
In particular, for any set B C §2 we have 


P(A) > Pz(B) DE P.({@ € 2’; (w, a’) € A}). (7.115) 
WE 


By Lemma 7.7.3 the set B = {@ € Q; ae Qnyp-in < Lry2(T, dw)} satisfies 
Pz(B) => 1/2. Using (7.45) at a given value of w we obtain 


1 1 
P.({o' 2": | rezil = ped, de) >>, 
I 
so that for @ € B we have 
/ i ts 1 
P.({o E £2’; (w, w') € A}) > 7c 

where A = {(@,0'); |) eiZill > C/Lr)Dys52"r-H} and (7.115) 
proves (7.114). 7 Oo 


It is unfortunate that in (7.113) the summation starts at n = 5 for otherwise we 
would be done. The next result addresses this problem. 


Proposition 7.7.6 There exists a constant a2 > O with the following property. 
Assume that for a certain number M we have 


P(| >i > M) <a). (7.116) 


Then the number jo of (7.69) satisfies roto < LrM. 
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Lemma 7.7.7 Consider independent complex-valued rv.s U; and independent 
Bernoulli r.v.s ¢; that are independent of the r.v.s U;. Assume that 


YS EWUil? A 1) >1. (7.117) 
i=l 
Then 
1 1 
P(| De >5)ey- (7.118) 


Proof We use Lemma 7.7.2(a) with W; = |U;|? A 1 and A = 1/4 to obtain 


P(r ale s)>o. (7.119) 


i=l 


Now, (6.16) implies that P.(| 7; eiUi| => (0; |Uil?)'/7/L) = 1/L, where P- 
denotes the conditional probability given the randomness of the (U;);>1. Combining 
with (7.119) concludes the proof. oO 


Exercise 7.7.8 Write all details of the previous argument in the spirit of the second 
proof of Lemma 7.7.5. 


Exercise 7.7.9 Assuming Der EUGP A 1) > B > O prove that then 
P(| es e,U;| > 1/K) => 1/K where K depends on 6 only. 


Proof of Proposition 7.7.6 Let us denote by Lo the constant of (7.118). We will 
show that w2 = 1/Lo works. Since 


’ 


| eZils) - Zi()| = 2 Yo eiZi 


(7.116) implies 


Vs,teT, P(| Y- ei(Zi(s) — zi(t)) > 2M) < P(| > eZil = M) 2 on 


(7.120) 


The condition P(| >>; €;(Zi(s) — Zi(t))| => 2M) < 1/Lo means that if we set Uj; = 
w(Zj(s) — Z;(t)) where w = 1/(2L0M), we have P()); ejU; > 1/Lo) < 1/Lo. 


Consequently (7.118) fails when U; = w(Z;(s) — Z;(t)) and therefore Lemma 7.7.7 
implies 


Vs,teT, ) E(\w(Zi(s)- Zi@)P AD <1. (7.121) 


l 
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Let j* be the largest integer with r/” < w, so that ri'+l 5 w and thus r-/" < 
r/w <2LorM = LrM. Since ri < w, (7.121) implies 


Vs,teT, yj*(S, t) <1. 


Consequently, jo > j* and therefore r—Jo <r < LrM. Oo 


Exercise 7.7.10 Given a number 8 > O prove that there is a number a(f) 
depending on f only such that if P(|| dizi e{Zi|| => M) => a(B), there exists jo 
with gj)(s,t) < B fors,t € T and r_/0 < KM where K depends on £ only. 


Proof of Theorem 7.5.1 We show that any constant wp < min(a,, a2) works. Let 
us assume that for a certain number M we have 


P(| Doeizil > M) <a. 


It then follows from (7.113) that ae 2"p--Jn < LrM and from Proposi- 
tion 7.7.6 that r~/0 < LrM. Since pa 2p Jn < Lr—“0, we have proved that 
baer Q"r—dn < LrM. oO 


7.8 Proofs, Upper Bounds 


7.8.1 Road Map 


We are done with the lower bounds. It is not obvious yet, but the arguments for these 
lower bounds are potentially very general, and we will meet them later in Chap. 11. 
Our goal now is to work toward the upper bounds, proving Theorem 7.5.5. The 
crucial case of the result is when the Z; are not random, Z; = a; x; for a complex 
number qj and a character y;. 

Starting now, the arguments are somewhat specific to random trigonometric sums 
and use translation invariance in a fundamental way. Thus, the remainder of this 
chapter may be skipped by the reader who is not interested in random Fourier series 
per se, although it should certainly be very helpful for the sequel to understand 
Theorem 7.5.14 (the decomposition theorem) since similar but far more difficult 
results are the object of much later work. 

The crucial inequality (7.145) below pertains to the case Z; = a; x; for a complex 
number a; and a character x;. The basic mechanism at work in the proof of (7.145) 
is a “miniature version” of Theorem 7.5.14. Once we have this inequality, more 
general upper bounds we need will be obtained by using it at given values of 
the Z;. This is the object of Sect. 7.8.6. We will still need to complement these 
bounds by very classical considerations in Sect.7.8.7 before we can finally prove 
Theorem 7.5.8. 
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7.8.2 A Key Step 


The importance of the following result will only become clear later. 


Theorem 7.8.1 Consider characters x; and assume that for a certain subset A of 
T and each i we have ea xi(s)diu(s)| => w(A)/2. Then given numbers aj and 
independent standard Gaussian rv.s gi, we have 


El] So aigixil) < L(>- lai?) Viog@/n(A)) (7.122) 


Having analyzed that this property was central for the decomposition theorem but 
being unable to prove it, I submitted this question to Gilles Pisier. He immediately 
pointed out that Theorem 7.1 of his paper [86] provides a positive answer in the case 
where the q; are all equal. Analysis of the arguments of [86] then easily led to the 
proof which we present here. To prove (7.122) we can assume that x; 4 1 for eachi 
by bounding separately the term where x; = 1. Setting v; = A Xils)du(s), so that 
|u;| > (A)/2, we will compare suitable upper and lower bounds for the quantity 
Ell >o; aivigixill. 7 

The lower bound is easy. Consider the distances d and d given by d(s,t)? = 
dy lai? 1xi(s) — xi? and d(s, 1)? = Yi; larvi?lxi(s) — xi(f)|?. Then since 
|ui| => p(A)/2 we have d(s,t) > p(A)d(s,t)/2 and thus w(A)y2(T,d) < 
Ly2(T, d).*? Furthermore E|| 1°; agi xi|| < Ly2(T, d) by (7.33) and yo(T, d) < 
LE|| >>; avi gi Xil| by (7.13). Consequently, 


A 
MELD aigixi| <E| Damgin. (7.123) 


The upper bound on the quantity E|| )>; ai vi gi xi || is provided by the following: 
Lemma 7.8.2. We have 


El) So aivigixi| < L(- lai?) (A) Vlog@/n(A)) . (7.124) 


Proof of Theorem 7.8.1 Combine (7.123) and (7.124). oO 
Before we can prove (7.124) we need two simple lemmas. 


Lemma 7.8.3 Consider complex numbers b; with >; bil? < 1/8. Then 
Eerp| >) bel SL. 


Proof Separating the real and imaginary parts, | )°, bigil? = ht + hs where 
hy hz are Gaussian (not necessarily independent) with Eh* < 1/8 and 


39 See Exercise 2.7.4. 
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Eh < 1/8 and we simply use the Cauchy-Schwarz inequality E exp(ht +h a) < 
(E exp 2h7)!/*(E exp 2h3)!/?. g 


Lemma 7.8.4 For any function f > 0 on G,*° any set A with (A) > 0 and any 
number C > 0 we have 


[ fdu < 4Cp(A)/log2/H(A)) i exp(f?/C?)du . (7.125) 


Proof The function x +> x~! expx? increases for x > 1, so that given a number 
0 <a <1, forx > 2,/log(2/a), we have x < aexpx* because this is true for 


x = 2,/log(2/a). Consequently for x > 0 we have x < 2,/log(2/a) + aexp x’. 


Therefore we have 


f < 2y/log(2/u(A)) + (A) exp f? . 


Integration over A gives 
i: fau = uA) YTox@7 MCAD + w6A) f exp fay 
< 4u(Ayiog@/uCAD | exp fan 
where we have used that 1 < { exp f ? du for the first term and 1 < eye) 


for the second term. This proves (7.125) by replacing f by f/C. 


Proof of Lemma 7.8.2 Recalling the value of v; and that x;(s)x;(t) = xi(s +f) we 
have 


anigin =f Pasi + sane) =f Yasin). 
i AY; Att | 
Since “(A) = w(A + Fr), using (7.125) in the inequality below we obtain 


| Y aivigixi || = sup | Yo aivigixi()| = sup| f Y aigi xi(s)du(s) 
i oe PAT; 


< LCp(A)y/log(2/u(A)) i] exp (|) argixi(s)|/C?)du(s), (7.126) 


40 The group structure is not being used here, and this lemma is a general fact of measure theory. 
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where C = A); lay. Lemma 7.8.3 used for b} = a;/C shows that for each 
s we have E exp(| yy Gi Bi Xi (s)|?/C7) < L so that taking expectation in (7.126) 
concludes the proof of (7.124). oO 


The result proved in the following exercise was discovered while trying to 
prove (7.122) and is in the same general direction: 


Exercise 7.8.5 


(a) Prove that there is a number a > 0 such that if z, z’ are complex numbers of 
modulus | with |z — 1| < a and |z’ — 1| < a then [Zz — z| > 4[z— |. 

(b) Consider 0 < a < ao, characters (x;);<N On a compact group T and the set 
A= {t € T,Vi < N,|xi(t) — 1| < a}. Consider the set B = {t € T,Vi < 
N, |xi(t) — 1| < w/2}. Consider a subset U of A such thats,t e U,s At> 
s —t ¢ B. Prove that the sets 5t + A fort € U are disjoint. 

(c) Prove that (B) > w(A)?. 


7.8.3 Road Map: An Overview of Decomposition Theorems 


Given a stochastic process (X;)+e7, we would like to decompose it as a sum of 
two (or more) simpler pieces, X; = X : + xe where, say, it will be far easier to 
control the size of each of the processes (x! \rer and (X rer than that of (X;)er 
because each of the two pieces can be controlled by a specific method. Furthermore, 
when the process (X;)rer has a certain property, say X; = »; Z;(t) is the sum 
of independent terms, we would like each of the pieces X ; and x to have the 
same property. This will be achieved in Chap. | 1 in a very general setting and in the 
present chapter in the case where Z; € CG. In the next section, the decomposition 
will take a very special form, of the type X; = )0;<; aixi(t) = xi + x? where 
x — ern a; xi(t) for two disjoint sets 1, 72 with J = J, U Jy. This is accidental. 
In general, the decomposition is more complicated, each piece Z; has to be split in 
a non-trivial manner, as will be done in Sect. 7.9. 

Although this will not be used before Chap. 11, to help the reader form a correct 
picture, let us explain that the decomposition of stochastic processes in a sum of 
simpler pieces is also closely related to the decomposition of sets of functions as 
in Sect.6.7 or as in the more complex results of Chap.9. This is because many 
of the processes we consider in this book (but not in this chapter) are naturally 
indexed by sets of functions (in this chapter the process is indexed by the group T). 
A prime example of this is the process X; = )°;.; git; (where (g;) are independent 
standard Gaussian r.v.s) which is indexed by 7(1) (where J is a finite set). For such 
processes the natural way to decompose X; into a sum of two pieces is to decompose 
t itself into such a sum. The main theorems of Chap. 11 are precisely obtained by 
an abstract version of this method. 
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7.8.4 Decomposition Theorem in the Bernoulli Case 


Our main result in this section is closely related to Theorem 7.5.14 in the case where 
Z; is nonrandom, Z; = a; x;. It will be the basis of our upper bounds. We consider 
a finite set 7, numbers (a;)jc7, characters (xi)icy with x; 4 1, and for 7 € Z we 
define 


wi(s..) => Irai(xils) — XO)P A 1, (7.127) 


where as usual }°; is a shorthand for }°,.;. Consider a parameter w > 1 and for 
n > O integers j, € Z. Consider the set 


Dn ={t €T ; W;,(t,0) < w2"}. (7.128) 


Theorem 7.8.6 Assume the following conditions: 
3 
(Do) = a’ (7.129) 


Vn >1, u(Dn)>No!. (7.130) 


Then we can decompose I as a disjoint union of three subsets I,, In, I3 with the 
following properties: 


h={iel; laj|>r-"}, (7.131) 
y laleta > eer, (7.132) 
i€ly n>0 


El > aigixi| < LV 2" , (7.133) 


ie n>0 


where (gi)ier, are independent standard Gaussian rv.s. 
To prepare for the proof, a basic observation is that as a consequence of (7.128), we 
have 


ip Win (8, O)du(s) < w2" (Dn) , 


and using the definition of y; this means that 


I So Irina: (xi(s) — I)? A Idy(s) < w2" (Dy) . (7.134) 
Dn i 
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For each n > O we define 
Un ={ie nf Ixi(s) — IPs) = (Da) (7.135) 
Dn 


The idea of this definition is that if i € U,, then x;(s) is not too close to 1 on 
Dy so that fin Ir” aj(xi(s) — 1)|* A 1dy(s) should be about (Dy) |(riaj| A 1). 
We will make this explicit in Lemma 7.8.9, but before that we stress the miracle 
of this definition: if i ¢ U,;, we have exactly the information we need to appeal to 
Theorem 7.8.1. 


Lemma 7.8.7 Ifi ¢ Uy then lf, xi(s)dr(s)| > wCD,)/2. 
Proof Indeed, since i ¢ Un, 


M(Dn) > [ [xi(s) — 1|?dye(s) = 2(Dn) — are | xi(s)du(s) . Oo 


n 


Lemma 7.8.8 For complex numbers x, y with |y| < 4 we have 
Ixy] A 1 > ly|(x|A 1)/4. (7.136) 


Proof We have |xy| A 1 > (([x| A 1)|y|) A 1 and |y|(|x| A 1) < 4. We then use that 
forO <a <4wehavea A 1 > a/4. oO 


Lemma 7.8.9 We have 


Yo Irénaj|? A 1 < 402". (7.137) 
icUy 


Proof According to (7.136) we have 
j 2 ! 2) -in 7.|2 
Irai(xi(s) — DIP ALS qixi(s) — 1 rail" Al), 
so that (7.134) implies 


Yilirinai? A »f [xi(s) — 1?du(s) < 4w2"u(Dn) , 
i Dn 


from which the result follows since tp, lxi(s) — 1|?du(s) > u(D,) fori e Un. O 
The next task is to squeeze out all the information we can from (7.137). 


Lemma 7.8.10 We have Up = I. 
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Proof We have {7 p, |xi(t) — 1\?du(s) < 4u(T \ Do) < 1 since |x;(s) — 1|* < 
4 and y(T \ Do) < 1/4 by (7.129). Thus fp, |xi(s) — IPdu(s) = fp lxits) — 
1|?dy(s) — 1 => 1 = w(Do) because f;, |xi(s) — 1|?dyu(s) = 2. Oo 
Let us define 
Ve (a 
O<k<n 

so that Vp = Ug = J and fori € J it makes sense to define 

u; =inf{r-”;ieN,ieV,}. (7.138) 


We then define 


h={iel; uj < |aj| <r}. 


Lemma 7.8.11 We have 


Vigetiwy ee, (7.139) 


ich n>0 
Proof It follows from (7.137) that 
card{i € Un; |aj| > r7"} < 4w2" (7.140) 
and consequently 


yore card{i € Un, ; jai] >r7/"} < Lw ye < Lw ee 2 
n>1 n>1 n>0 


(7.141) 


Now, we have 


yor \cadlie Us laler = >> FP yee iagiseny (7-142) 


n>1 ieI,n>1 


agi 
tee ees 


i€lyn>1 


IV 


Fori € Jp we have |a;| > uj, and by definition of u;, there exists n with ron< |aj| 
andi € V,. Consider the smallest integer k > O such that rik < |a;|. Since 
lai| < r~/© we obtain k > 1 so that |a;| < r~/&-!. Since i € Vy, C Ux this 
shows that >°,,. FPN ery. ising > r—J-1 > |a;|. Thus the right-hand side 
of (7.142) is > vie, |a;|, completing the proof. oO 
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Proof of Theorem 7.8.6 Let 13 = {i € I; |aj| < uj} so that J is the disjoint union 
of I, 12, 5. It remains only to prove the inequality (7.133) involving the set /3. For 
n > Oletus set W, = BN(Vin\ Vn41) so that fori € W, wehavei ¢ U,+1, and thus 
I SDs xi(s)du(s)| => “CDn41)/2 by Lemma 7.8.7. Since /log(2/u(Dn+1)) < 
L2"/? by (7.130), it then follows from (7.122) used with A = Dy that 


E| >> aigixi| < 22"?( Oo lai?) (7.143) 


ieWn ieWn 


Let us bound the right-hand side. Fori € W, we havei € J3 so that |a;| < uj. Since 
i € V, we have also uj; < r~/”. Thus lar!" | < 1 and (7.137) implies View, \a;|* < 
Lw2"r-*, Finally (7.143) implies 


E| » aigixi|| < L/w2"r-* 


icW, 


For i € Jz we have u; > 0 since |a;| < uj; so that there is a largest n for which 
ié V,. Theni ¢ Vj41 so thati € V, \ Vn41 and thusi € W,,. We have proved that 
Iz = Un Wh. Use of the triangle inequality then implies 


E| > aigixi| <Low 02". 4 


ieh n>0 


Corollary 7.8.12 Under the conditions of Theorem 7.8.6, we have 


y(T, do) <= bw 2"r-# + LY Nail aise doy » (7.144) 


n>0 i 


where the distance d> is given by dz(s, t)* = >; lai lxi(s) — xi). 


Proof For any set J we have E|| )0;-; aigixill < L >oj<, |ai|. Using this for J = 
I, gives E|| oie aigixill < LY; lailN ja; |>r-i0}- Using it again for J = Jy and 
combining with (7.132) gives E]| eee 4i8iXill < Vien, lail < Lw Vins 2 p—Jn, 
Combining these two inequalities with (7.133) yields 


E| ae | < Lw eg a ap lai |V 414; |>r-J0} . 
i 


n>0 i 


iél 


The result then follows from (7.13). oO 
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7.8.5 Upper Bounds in the Bernoulli Case 


Theorem 7.8.13 Under the conditions of Theorem 7.8.6 for each p > 1, it holds 
that 


(E sup | >) e1ai(xi(s) — xi cy)" 
seT ; 


< KO, pw) 2" + Kr) > lailAya,sp-oy - (7-145) 
n>0 i 
Proof For a subset J of J define g(J) = (Esupyer | icy Eiai(xi(s) — 


xi(0))|?)!/P. Observe first that as a consequence of the trivial fact that 
I ies €1ai (Xi (s) — x1O))| < 2 Vics lail, we have (J) < 2) ic; |ai|. Thus 
gi) < 20; lai|Vpi;|>r-J0) and y(n) < Lw)?,592"r-/". Furthermore, by 
the triangle inequality if J; and J2 are two disjoint subsets of J, we have 
g(J, U Ja) < (Ii) + g(J2). Thus it suffices to prove (7.145) when the 
summation is restricted to J3. It is then is a consequence of (2.66) applied 
to the process X; = ae I 814i xi(t). Indeed, according to the subgaussian 
inequality (6.2), this process satisfies the increment condition (2.4) with respect 
to the distance d3 given by d3(s, ty? = Vien |ai(xi(s) — xi(t))|?. Furthermore 


y2(T, d3) < L./w pare, 2"r—Jn as follows from (7.133) and (7.13). Oo 


7.8.6 The Main Upper Bound 


In this section we state and prove our main upper bound, Theorem 7.8.14. It will 
follow from (7.145) given the randomness of the Z;. We recall that E, denotes 
expectation in the r.v.s ¢; only. We recall that 


gj(s,t) =) Er! (Zils) — ZiM)P AI). 


i>l 
Theorem 7.8.14 Forn > 0 consider numbers j, € Z. Assume that 
Vs,tET , gj (s,t) <1 (7.146) 
Vn>1, wdseT; o;, (5,0) <2) >2 =N,'. (7.147) 


Then for each p > | we can write 


(E. sup | } > e{(Zi(s) — z,(0))|") <¥,+Y¥o, (7.148) 
vy 


Seb iS] 
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where 
(EY) =K@p) > ore: (7.149) 
n>0 
and 
¥y SK) Zi Olly z,coyor-a) « (7.150) 
i>1 


In words, Y2 collects the contributions of the “large” terms, and we control well all 
moments of Y;. It is by no means obvious how to control the term Y2, and it will 
be a separate task to learn how to do this. The difficulty of that task will depend on 
our goal. The simplest situation will be when we will apply this result to study the 
convergence of series: then P(Y2 4 0) will be small. 

The reader is invited to meditate on the strength of this result. In particular, we 
do not know how to deduce it from the decomposition theorem, even in the precise 
form of Corollary 7.9.2.4! 

The main step of the proof is as follows. It will allow us to use Theorem 7.8.13 
given the randomness of the Z;, which is symbolized by w. 


Proposition 7.8.15 Under the conditions of Theorem 7.8.14 denote by w(w) € 
IR* U {oo} the smallest number > \ for which 


w(fseTs Dr(Zis) — ZO) PALS w})>3/4, 7.15) 


i=l 


va>1, w(fseTs Dirn(Zi(s) — ZiO)P-A1 < w@2"}) > Noh. 
i=1 


(7.152) 


Then the rv. w(@) satisfies P(w(w) > u) < Lexp(—u/L). Inparticular Ew(@)? < 
K(r, p) for each p. 


Proof of Theorem 7.8.14 Given the r.v.s Z;, consider the quantities yj(s,t) = 
>; \r/ (Z;(s) — Z;(t)) \ A 1. As we forcefully pointed out, given the randomness of 
the Z;, then Z; is of the type a; x; for a number a; and a character x;. Thus (7.151) 
implies (7.129) and (7.152) implies (7.130) with w = w(@). We are then within the 
hypotheses of Theorem 7.8.13 and (7.145) implies (7.148) where Y2 is as in (7.150) 
and where Y; < K(r, p)w(@) ¥- 59 2"r—Jn, from which (7.149) follows using the 
previous proposition. 7 Oo 


4! The specific problem is to show that the terms Zz} satisfy (7.148) for p > 2. For p < 2 this can 
be shown using the same arguments as in Theorem 7.3.4. 
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To prepare the proof of Proposition 7.8.15 we set Do = T and forn > | we set 
Dn = {8 €T ; gj, (5,0) <2"), 
so that by (7.147) for n > 0 we have (Dn) > 1/Nn. The strategy to follow is then 
obvious: given n, it follows from Lemma 7.7.2 that for any point s € D, it is very 
rare that si |r/» (Zi (s) — Z;(0))|? A 1 is much larger than 2”. Thus by Fubini’s 
theorem, it is very rare that the set of points s € D, with this property comprises 


more than 1/4 of the points of D, (this event is the complement of the event 92y,1 
below). Thus with probability close to one, this should occur for all n. 


Lemma 7.8.16 Consider a parameter u > 1. For eachn > 0 consider the random 
subset By, of Dy defined as follows: 


Bau = {s € Dn: Ls Ir” (Zi(s) — Z;(0)) |? Al < wont? . (7.153) 


i>l 
Then the event 92n,, defined by 
Qnu ={@€ Qs U(Bnu) = 3u(Dn)/4} 
(where as usual w symbolizes the randomness of the Z;) satisfies 
P(Qn.u) = 1—4exp(—u2"t!) (7.154) 


Proof Consider s € D, and W; = |r/"(Z;(s) — Z;(0))|? A 1, so that });., EW; = 
gj, (8,0) < 2”. It follows from Lemma 7.7.2 (b), used with A = u2"+? that 


POs @ Bru) = P( Soir (Zils) — Zi)? A 1 > u2"*?) 
i=l 


<5, i= exp(—u2"*!) : 


Then we have 
one = / P(s ¢ Bny)dpu(s) < 8npt(Dp) . (7.155) 
Dn 


Consequently P(i(Dn\ Bru) = &CDn)/4) < 46n by Markov’s inequality. There- 
fore the event (2, defined by 4(Bn.y) = 344(Dn)/4 satisfies 


P(Qnn) = P(s4(Baa) > 5H(Dm)) = P(W(Dn \ By) < 5H(Dn)) = 1 48; 


and we have proved (7.154). oO 
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Exercise 7.8.17 Write a detailed proof of (7.155). 


Proof of Proposition 7.8.15 We recall the sets B,, and the event Qn, of 
Lemma 7.8.16, and we define 


Qu = (nus (7.156) 
n>0 
so that from (7.154) 
P(92,) = 1— Lexp(—u). (7.157) 


For w € 2, we have .(Bo,,) > 3/4 (since Do = T) and (Bry) = 3U(Dn)/4 = 


Noy for n > 1. Thus by the very definition of w(w), we then have w(w) < 4u. 


Thus P(w(w) > 4u) < P(2°) < exp(—u). Oo 


7.8.7 Sums with Few Non-zero Terms 


To complete the proof of Theorem 7.5.5 using Theorem 7.8.14, we need to learn 
how to control EY? . This is the content of the next result. 


Theorem 7.8.18 Assume (7.66), i.e. 
Vs,teEeT, gp(s,t) <1. (7.66) 


Then for each p = | the variables ¢; := |Z; (O) Ny) 7, (oy) >r-0} satisfy 


(EC) a)” <k(r"+(EIDO &Z:()\")”) (7.158) 


i=l i=l 


Proof of Theorem 7.5.5 We use Theorem 7.8.14. We raise (7.148) to the power p, 
we use that (Yj + Yo)? < K (pp + YP), and we take expectation. The control of 
EY; is provided by (7.149), and the control of Ey; is provided by (7.158). oO 


The basic reason we shall succeed in proving Theorem 7.8.18 is that typically 
only a few of the r.v.s ¢; will be non-zero, a fact which motivates the title of this 
section. Our first goal is to prove this. We start with a simple fact. 


Lemma 7.8.19 For any j € Z we have 


5,tET 


S~E(l2r/ ZO)? A 1) <2 sup DO E(r4(Zi(s) -— Zi@)P AY. — (7.159) 


i 
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Thus if jo satisfies (7.66) then 


SY E(2r”Z;, OPA 1) <2, (7.160) 


L 


and in particular 


> P(Z;(0)| =r) <2. (7.161) 


One should stress the interesting nature of this statement: a control on the size of the 
differences Z;(s)—Z;(t) implies a control of the size of Z; (0). The hypothesis (7.58) 
that Z; ¢ C1 a.e. is essential here. 


Proof Since Z; € CG, we have Z;(s) = x(s)Z;(O) for a certain character x, and 
since by (7.58) x 4 1 a.e., (7.34) implies that a.e. 


7 [Zi(s) — Z;(0)|7du(s) = 2|Z;(0)/ , (7.162) 


whereas |Z;(s) — Z;(0)|* < 2|Z;(s)|* + 2|Z;(0)|? = 4|Z;(0)|?. Now for x > 0 
the function (x) = (r2/x) A 1 is concave with w(0) = 0, so it satisfies xwW(y) < 
yw(x) for x < y. Using this for x = |Z;(s) — Z;(0)|*, y = 4|Z;(0)|? = |2Z;(0)|?, 
and integrating in s with respect to w, we obtain w(|2Z; (0) |?) < 2f wW(|Zi(s) — 
Z; (0) \*)du (s), and taking expectation, we obtain 


E(\2r/Z;(0)|? A 1) < 2 | E((r7/|Zj(s) — Z;(0)|?) A 1)du(s) . 


Summation over i then makes (7.159) obvious, and (7.161) follows since 
P(|Z;(0)| =r) < E(j2rZ;(0)/? A 1). o 


Exercise 7.8.20 Instead of (7.66) assume that w({s € T; gjg(s,0) < 1}) = 3/4. 
Prove that }>, E(|2r/°Z;(0)|* A 1) < 4. 


So, as promised earlier, (7.161) means that typically only a few of the rv.s ¢; = 
|Z; (0)|1 {|Z;(0)|>=r-4oy AD be non-zero at the same time. To lighten notation, in the 
next few pages, K = K(p) denotes a number depending on p only. Also, even 
though the sums we consider are finite sums }_;, it is convenient for notation to 
write them as infinite sums )7;.,, with terms which are eventually zero. 

We start the study of sums of independent r.v.s with only a few non-zero terms. 
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Lemma 7.8.21 Consider independent centered complex-valued r.v.s 0;. Assume 


that’? 
Y>P@ #0) <8. (7.163) 
i=l 
Then, for each p = 1, we have 
pa Elo? < KE yal (7.164) 
i>l i>1 


The intuition here is that because the typical number of non-zero values of 0; is 
about 1, it is not surprising that | }°;. , 6;|? should be comparable to }7;. , |6;|”. 


i=1 
Lemma 7.8.22 Consider independent events &; with P(&;) < 1/2 and 
>; P(Z;) < 8. Then P(U; 3) < 1-7 '*. 


Proof We use that | — x > exp(—2x) for 0 < x < 1/2, so that 


1-P(_) 5) =] [da — Pcs) = [ [exp(-2P(a)) = exp(-16). so 


Proof of Lemma 7.8.21 From (7.163) there are at most 16 indices i with P(6; 4 
0) > 1/2. We can assume without loss of generality that fori > 17 we have 
P(@; # 0) < 1/2. As a consequence of Jensen’s inequality*? for any set J 
of indices, we have E|>°;., 4:1? < E|>0)~, 9i|?. In particular for any index 
io, we have E|@j)|? < E]>°,.,6;|?. Thus }°,-;6E|6i|? < 16E|>0;.,0;|? and 
E| >>. 17 9)/? < E| >>), 6;|?. Therefore it suffices to prove that E)~;. 7 |6;|? < 
LE| >>; 17 9;|”. Consequently, we may assume P(6; 4 0) < 1/2 for each i. 

For n > 1 consider Q, = {Si < n,0; 4 O}. Then P(2,) < 1 — e7!® by 
Lemma 7.8.22, so that P(2°) > e~!6, 

We prove by induction on n that 


yi e Ea? <E| >|? . (7.165) 


i<n i<n 


It is obvious that (7.165) holds for n = 1. Assuming it holds for n, we have 


E| >> 6 |? =Ele,| >> 6|? +Elac| D> al’. (7.166) 


i<n+l i<n+l i<n+l 


42 The number 8 does not play a special role and can be replaced by any other number. 
43 Tf this is not obvious to you, please review Exercise 6.2.1 (c). 
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Now, since 6,41 is centered and independent of both 2, and >°,—,, 6;, Jensen’s 


inequality implies 


i<n 


Elo, 


>- a? = Elg,| >- |? =E| a? . (7.167) 


i<n+l i<n i<n 


Since fori <n we have 6; = 0 on 22f, 


El 


YS |? = Ele ln pil? = POQ)EOn411? = cE 11? , 
i<n+l 


using independence in the second equality and that P(2°) > e~'®, Combining 
with (7.166) and (7.167) and using the induction hypothesis completes the induc- 
tion. Oo 


Lemma 7.8.23 Consider independent r.v.s nj > 0 with ae P(n; > 0) < 8. Then 
for each p = 1, 


E> 9) 2S Ey. (7.168) 


i=l i>1 


Again, the intuition here is that the typical number of non-zero values of nj; is about 
1, so that (D°;. , ni)? is not much larger than )>;., ?.. 


Proof The starting point of the proof is the inequality 
(a+b)? <a? + K(a?"'b+b?) ; (7.169) 
where a, b > 0. This is elementary, by distinguishing the cases b < a and b > a. 


Let S, = yer ni, so that using (7.169) fora = S, and b = ny+1 and taking 
expectation, we obtain 


ES?., SES + K (ESP ma + EW i): (7.170) 
Let an = P(nn > 0). From Holder’s inequality, we get 
ES, 2 (ESP? Enea Pen? 


Using independence then implies 


-1 -1 = —1 
ESP nner = ESE Ennis < (ESP)O-D/Pae OP EnP VP | 


260 7 Random Fourier Series and Trigonometric Sums 


Now, for numbers a, b > 0, Young’s inequality implies that a?—))/?b!/P < a+b 
and consequently 


-1 
ES, M+1 S OES, + En . 


Combining with (7.170) yields 


ES? , < ESP(1+ Kani) + KEn?,, < (ES? + KEn?, )(1+ Kan41) . 


In particular we obtain by induction on n that 
ES? < K( )\ En?) []d+ Kai), 
i<n i<n 


which concludes the proof since ae 1 ai < 8 by hypothesis. Oo 
We now have all the tools to prove our main result. 


Proof of Theorem 7.8.18 Let us define 6; := Z;(O)1y7,())>,-i0; SO that ¢ = 
|0;| = |e;6;|, and (7.161) implies 


V5 PGi #0) = IPG #0) <2. (7.171) 


i=l i=l 


Using (7.168) in the first inequality and (7.164) (for ¢;6; rather than 6;) in the second 
one, we obtain 


Seat < KC E¢? < KE| > -e16)|” . 


i=l i=l i=l 


Let 6 := Z;(0) — 6 = Z; (0)1i17,@)<r-40}> so that 6; = Z;(0) — 6; and thus 


E| >~2i6;|” < KE] )\ eZ:|? + KE] > ei6f|’ , 


i>1 i>1 i>l 
and in order to prove (7.158), it suffices to prove that 


E| > 6:6)? < Kr7/0P , (7.172) 


i=l 


First, Khinchin’s inequality (6.3) implies 


E.| So eiatl? <K( Doe)”. (7.173) 


i=l i=1 
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The rv.s W; = r™/9|6// satisfy O < W) < 1 and ));.,EW; < 2 by (7.160). 
Lemma 7.7.2 (b) provides the estimate PQs W; > t) < exp(—t/2) fort > 8, 


and this implies E( st wi)? ie < K. Consequently taking expectation in (7.173) 
yields (7.172) and completes the proof. oO 


7.9 Proof of the Decomposition Theorem 


7.9.1 Constructing Decompositions 


Consider independent r.v.s Z; € CG and assume (7.58) as always. Our approach 
will be based on the decomposition (7.60): Z; = eel & exe where the r.v.s 


(&i,2)e>1 are valued in C and have “disjoint supports”.*+ We will use many times 
that for a function h with h(O) = O and r.v.s fe with disjoint support, we have 


h(>- fe) = Aho). 


In particular, using that for each i the functions (&;,¢)¢>1 have disjoint support, 
(7.63) becomes 


9j(s,t) = > E(r (Zils) — ZO) PAD = DEC’ &i,e(xe(s) — xe)? AD). 
i i,£ 
(7.174) 


Consider a parameter w (which will be useful for later purposes) and integers 
(in)n>0- For n > 0, we set 


Dn ={s €T ; 9;,(8,0) < w2"}. (7.175) 


Let us then assume that 


3 1 
L(Do) = re Wn = 1, “(Dn) = re (7.176) 


n 


Proposition 7.9.1 Under the preceding condition (7.176), there exist truncation 
levels ue => O with the following properties. First, 


Se Eli eM upcigieleroy S bw D2" (7.177) 


i él n=0 


44 We use this expression as a shorthand for the following property: For each i we have &; ¢&.¢” = 0 
as. if € 4 £’. In any given realization of the sequence, &;,¢ at most one term is not zero. 
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Moreover, setting 


Ge = (SEL elUgeeleug) (7.178) 
i 
then 
El >. aegexel| < LVw > 2" , (7.179) 
l>1 n>0 


where (g¢)¢>1 are standard independent Gaussian r.v.s. Finally 


> PZi(0)| =r) < 4w. (7.180) 


The proof of Proposition 7.9.1 occupies Sect. 7.9.2. 


Corollary 7.9.2 Under Condition (7.176) we can find a decomposition Z; = Zz + 
Ze + Zz such that for each s = 1, 2,3 the sequence (Z?); is independent, valued in 
CG and moreover 


> P(Z} 40) < 4w, (7.181) 
SEZ? O)| <Lw) or, (7.182) 
i n>0 
yn(T,d)<LJ/w yo 2"r-#, (7.183) 
n>0 


where the distance d is given by d(s, t= ; Elz? (s) — Zz (Gigs 
Proof We write 


Zi = Zi1qz,)1>r-%0} » (7.184) 


ZPo= Voor Si elyp<je, gerioyXe and Z? = Voss SieMle,el<ue}Xe so that 
using (7.178) and the fact that the r.v.s (&,¢)e¢>1 have a disjoint support, we have 


d(s,t)? = )“E|Z3(s) — BOP = > YEE egieleuer(xe(s) — xe)? 


i €>1 


= SO EE el eeteuey| Ixe(s) — xe? = Da lxe(s) — xe? - 


é>1 i >1 


4 Thus, if Z; = &; x;, then each Zi, Z, Ze is of the type 7; x;. 
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Thus d is the canonical distance associated with the process )> 1 Aegexe, 
and (7.183) follows from (7.179) and (7.13). The rest is obvious: (7.181) follows 
from (7.180) and (7.182) follows from (7.177). oO 


Proof of Theorem 7.5.14 Let us set S = Ell >°; eZ;|| so that by Markov’s 
inequality (7.65) holds for M = S/ao. We then deduce from Theorem 7.5.1 that 
there exist integers (jn )n>o for which >°,,.9 2"r—Jn < LS and which satisfy (7.66) 
and (7.67). In particular, (7.176) holds for w = 1. We then apply Corollary 7.9.2 
(still with w = 1) to obtain a decomposition Z; = Zz. + Ze + Zz. We set 
Zz, = Zz + Zz Z, = Zi. We note that (7.94) follows from (7.184). To prove 
Theorem 7.5.14, it therefore suffices to prove that E}°;|Z/(0)| < LS. Since 
E>; |Z?()| < LS by (7.182), it suffices to prove that E>; |Z}(0)| < LS. 
Recalling the expression (7.184) of Z, and setting 09; = Z; (0)1).7,@|>r-/0} it 
suffices to prove that 


E> l=7s. (7.185) 


Now >°; P(@; # 0) < 4 by (7.181). Then using (7.164) for p = 1 proves that 
E>°; |6;| < LE| >); ¢:6;|. But, using (6.17) in the third inequality, 


E| > 2:6;| = EE.| >> 2i6,| < E( >> 1417)? < E(>> 1Z;@)"” 


L l 


< LEE,| > eZ;(0)| =LE| > eZ; |< LS. (7.186) 


We have proved (7.185) and completed the proof. oO 


7.9.2 Proof of Proposition 7.9.1 


The reader should master the proof of the simpler Theorem 7.8.6 before attempting 
to read this more complicated argument. It uses essentially the same idea, which we 
spell out in the simpler case where Z; = & x; for a nonrandom character x;. The 
essential step is to construct the truncation level u; at which we truncate &;. It is 
given by formula u; = inf{r~/"} where the infimum is taken over the values of n 
such that for each k < n we have ti lxi(s) — 1|*dy(s) > w(Dz). 

To start the proof of Proposition 7.9.1, for n > 0 we define 


Un = {e came i Ixe(s) — 1[?dy(s) = (Dn)} (7.187) 


Dn 
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As in Lemma 7.8.10 we have 
Up = N*. (7.188) 


Lemma 7.9.3 Ifn > 0 and £ € U, for each i we have 
w(Dn)E(\r"Ei,e\" A 1) <4 / E(ir"éi,e(xe(s) — DP A Ddu(s). (7.189) 
Dn 


Proof We deduce from (7.136) that 
Ir" e(xe(s) — DP ALS Ixe(s) — UP Ure"&i,e? AD/4, 


we take expectation, and we integrate over D,, using the wf Dy lxe(s) — 1|?dy(s) > 
[L(Dy) since £ € Up. oO 


Corollary 7.9.4 [fn > 0 we have 


> do Etre)? A 1) <2"? w (7.190) 


i lEUn 


Proof Summing the inequalities (7.189) overi and £ € U,, we obtain 


(Dn) > >) E(ir/"&i,e)? A 1) 


i LeU, 


< swf YE! e(xe(s) — DPA DdeG). (7.191) 


m €>1 


Since the r.v.s (&,¢)¢>1 have disjoint supports, as in (7.174), we have 


> E(r-"&,e(xe(s) — DPA 12) = E(ir"(Zi(s) — ZO) PAD. 


>1 


Recalling the definition (7.174) of g; and that gj,(s,0) < w2" fors € Dn we 
conclude that 


y i E (ir (Zils) ~ ZOYP-A Naw) = J wj(s. Odes) < 2wa(Dy) 


n 


Therefore we deduce from (7.191) that 


(Dn), Y) E(\ri&e|? A 1) < 2"? wu(Dn) , 


i LEUn 


which concludes the proof. Oo 
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Proof of (7.180) We have 


P(Z;(0)| =r”) < E(\r ZO)? A1) = Edge? AD), 
l>1 


where we use in the equality that the r.v.s (&,¢)e¢>1 have disjoint supports. Since 
Up = N* by (7.188) the case n = 0 of (7.190) proves (7.180). oO 
We set Vi, = No<k<nUx, so that Vo = N* by (7.188). We define 


ug = inf{r-" , £2 € Vp} , (7.192) 


and we keep in mind that, by definition for 2 € V,, we have ue <r". 


Lemma 7.9.5 For each i and each £, we have 


Gielen) SD gi elerminy (7.193) 
{n>1;£eUy} 


The sum on the right is over the values of n > 1 such that U;, contains £. 


Proof Consider w with ug < |&;,¢(@)| < r—J0, By the definition (7.192) of ue, there 
exist n such that £ € V, and r~/" < |&,¢(w)|. Consider the smallest integer k <n 
such that r-/k < |&;.¢(w)|. Then since r~ < |&,¢(@)| < r7/ we have k > 1. 
Thus |& ¢(@)| < r—4k-1, for otherwise k would not be the smallest possible. Thus 
l&i.e(@)| < rH Ge spiky (@)- Since € € V, andk < n by definition of V,, we 
have £ € Ux. The result follows by considering the term for n = k in the sum in the 
right-hand side of (7.193). El 


Proof of (7.177) Taking expectation in (7.193) and summing over i and € > 1 
shows that the left-hand side of (7.177) is bounded by 


Pa er 2) rs ely or, 


n>1 i lEUn n>1 n=0 
where we have used that P(|&,¢| > r~/") < E(\r/&,¢|? A 1) and (7.190) in the first 
inequality. oO 
Lemma 7.9.6 Recalling the quantities ag of (7.178), for each n = 0 we have 


> ay < Lw2"r-2h (7.194) 
LEV 


Proof Letus write nj,¢ := &,¢1(\g, ¢|<u,}, 80 that by definition (7.178), we have a; — 


Dr E|ni.el?. When £ € Vp, as we noted we have ug < r~/” so that since Ini,e| < ue 
we have |r/"nj.¢|?7 < 1. Thus Elr/nje|? = E(\rniel? A 1) < E(\r&.el? A 0, 
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and (7.190) implies 


Yo dS El nel? < 4w2" . 


i LEV, 


The left-hand side above is r7/" )°)cy, >; Elni.c|?, and recalling (7.178) this is 
r2jn recy, a; so that (7.194) is proved. o 


Lemma 7.9.7 Forn > 0 let W, = Vy, \ Vn41. Then 


El >> aegexel| < LVw2"r-> . (7.195) 
LeWn 


Proof Since Vn41 = V»aNUn+1 for 2 € Wy, we have € ¢ U,+1 so that by definition 
of that set, 


M(Da+i) > i Ixe(s) — 1|?du(s) = 2u(Dn41) — are | xe(s)du(s) , 


Dn+i Dn+i 


and in particular lhe 4 xe(s)du(s)| => Uw(Dn+1)/2. Thus (7.195) follows 
from (7.122) used with A = D,+1 and (7.194). oO 


Proof of Proposition 7.9.1 It remains only to prove (7.179). For this we write 


E| 3 aegexe|| < »E| > aegexe|| < Lafw >> rs ; 


€EUn>0 Wn n>0 LEW, n>=0 


where we have used (7.195) in the last inequality, and we observe that by (7.192) 
for £ ¢ Unso Wh ie., 2 € Nns0Un we have ug = 0, so that ag = 0. oO 


Exercise 7.9.8 Consider a random trigonometric sum Se 148i xXi and the asso- 
ciated distance d. Consider numbers ¢€, such that w({s € T;d(s,0) < e,}) = 
N,,'. Find a partition J = UyJy such that El] 7;¢;, aigixil| < L2~"/?en. The 
point of this result is that if €, is as small as possible, then by (7.4) and (7.13), 
we have >...) 2"/7€n < LE|| ); aigixil|. Then in the bound E|| >; agi xil| < 
eer = pao 1, U8ixXill, the right- and the left-hand sides are of the same order. 
Hint: Copy the previous arguments. If D, = {s € T; d(s, 0) < €,} and U,, is given 
by (7.187), then define J, = {i € I, Vk <n,i € Ug, i ¢ Uns}. 


Exercise 7.9.9 ((86] Proposition 4.5) Consider characters (xi)j<n. Assume 
that { exp(| Dien xil?/CN)du < 2. Prove that E|| Di<n sixill => N/K(C). 


Hint: Prove that if a set D satisfies /log(2/u(D)) < N/K(C) then 
sUPsep i<n |Xi(s)—1|? = N by proving that f, >>; |xi(s)—1/?du(s) = Nu(D). 
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7.10 Proofs, Convergence 


After the hard work of proving inequalities such as (7.68) and (7.148) has been 
completed, the proof of Theorem 7.5.16, which is the goal of this section, involves 
only “soft arguments”. To prove convergence of a series of independent symmetric 
r.v.s, we shall use the following general principle, which relates the convergence a.s. 
of a random series with its convergence in probability. 


Lemma 7.10.1 Consider independent symmetric Banach space valued r.v.s W;. 
Then the series )\,., W; converges a.s. if and only if it is a Cauchy sequence in 
probability, i.e. 


i>1 


V5 >0, 3ko, ko sk<n=P(| D> wi] =s) <5. (7.196) 


k<i<n 


Proof It suffice to prove that (7.196) implies convergence. Let S; = )°; <x Wi. Then 
the Lévy inequality 


P( sup Sl] = @) < 2PUISill = @) 


k<n 


(see [53], page 47, equation (2.6)) implies 
P( sup Sel > @) < 2supP(ISull > @) , 
k n 


and starting the sum at an integer kp as in (7.196) rather than at 1, we obtain 
P( sup IISk — Sigll = @) < 2supP(Sn — Sigll = a) 
k n 


For a = 6 the right-hand side above is < 26. Since ||S, — Sxl < ||Sn — Skql| + || Sz — 
Sk ||, this proves that 


P( sup ||Sp — Sx|| 2 26) < 46, 


ko<k<n 


and in turn that a.s. the sequence (S;(@))x>1 is a Cauchy sequence in probability. 
oO 


Exercise 7.10.2 


(a) Let (W;)j>1 be independent symmetric real-valued r.v.s. Assume for some a > 
0 (or, equivalently, all a > 0) we have 


YEW; Aa’) < 00. (7.197) 


i=l 


Prove that the series }°,. , Wi converges a.s. 
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(b) Prove the converse. 


Exercise 7.10.3. The neighborhoods of zero for the convergence in probability are 
the sets of functions such that P(|f| => 6) < 1—6 for some 6 > 0. Prove that a 
function is “small in probability” if and only if there is a set a probability almost 
1 on which the integral of the function is small. Prove that the convergence in L? 
(p = 1) is stronger than convergence in probability. 


We will prove Theorems 7.5.16 and 7.5.17 at the same time by proving the following 
statements. In each of them, (Z;);>1 is an independent sequence with Z; € CG and 
(€;)j>1 are independent Bernoulli r.v.s. independent of the sequence (Z;). 


Lemma 7.10.4 /f the series )°;. | €:Zj converges a.s., then for each a > O there 
exists M such that for each k we have P(||Sx|| => M) < a, where Sx is the partial 
sum, Sp = ree Ej Zj. 

Proof Denoting by S the sum of the series, given a > O there exists kg such that 
P(||S — Sxl] = 1) < @/2 for k > ko. Consider then Mo such that P(||S|| => Mo) < 
a/2, so that for k > ko we have P(||Sz|| => Mo +1) < a, from which the result 
follows. oO 


The next result is a version of Theorem 7.5.1 adapted to infinite sums. We recall 
the number qo of this theorem. 


Lemma 7.10.5 Consider an independent sequence (Z;)i>1 with Z; € CG, and let 
Sk = Seite €;Z;, where the Bernoulli r.v.s ¢; are independent of the Z;. Assume 
that for each k we have 


P(||Sx |] => M) < a0. (7.198) 


For j € Z we define 


gj(s.t) =) E(ir/ (Zils) — Zi) PAD. (7.199) 


i>1 


Then we can find integers (jn)n>o such that 


Vs,téET, dp(s,t) <1 (7.200) 
u({s ; gj,(s,0) <2") >= N, 1, (7.201) 

and 
ore RM (7.202) 


n>0 
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Lemma 7.10.6 Assume that there exists integers (jn)n>0 as in (7.200)-(7.202). 
Then there exists a decomposition of Z; as in Corollary 7.9.2. except that all finite 
sums 7}; are replaced by infinite sums ));.- 


Lemma 7.10.7 When there is a decomposition of Z; as in Lemma 7.10.6 the series 
es €;Zj converges a.s. 


Proof of Theorems 7.5.16 and 7.5.17 The theorems follow from these lemmas 
using also that the last statement of Theorem 7.5.16 follows from Theorem 7.5.5. 
| 


Proof of Lemma 7.10.5 The reader should review Theorem 7.5.1 at this stage, as 
our proof consists of using this result for each k and a straightforward limiting 
argument. Let us define 


9,408, #) =D E(Ir4 (Zi(s) -— ZIP AD, 
i<k 
so that 
9j,t) = lim ox, 5,0) - (7.203) 
Theorem 7.5.1 implies that for each k we can find numbers (jx,.n)n>0 for which 
VsteT; Ojo <1, (7.204) 
and, forn > 0, 
LS 3 Qk, jen (8,0) < 2") > Nz! (7.205) 


such that the following holds: 


Se 2M, (7.206) 


n>0 


The conclusion will then follow by a limiting argument that we detail now. The 
plan is to take a limit k — oo in (7.204) and (7.205). As a first step, for each n 
we would like to take the limit limy_, oo jx,n. We will ensure that the limit exists by 
taking a subsequence. It follows from (7.206) that for each n the sequence (jx,n)x is 
bounded from below. To ensure that it is also bounded from above, we consider any 
sequence (j,*) such that }°,,.9 2"r—Jn << LM, and we replace jx.n by min(jg.n, Tn): 
Thus jn,~ is now bounded from above by j* and (7.205) and (7.206) still hold. 
Thus we can find a sequence (k(qg)) with k(q) — oo such that for each n, jy = 
limg— oo jk(q),n €xists, 1.€., for each n, jx(q),n = Jn for g large enough. By taking a 
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further subsequence if necessary, we may assume that for each n > 0 we have 
G=Nn=> jkqg)n = jn> 
so that then 9 (q), jn = Pk(q). jx(q),n* Consequently (7.204) implies 
Vs,teET , Og), <1, (7.207) 
and (7.205) implies that forn > 1 andg >n 
HAS 3 Gq), i680) <2)=Ne', (7.208) 


while, from (7.206), 


Dy, BPSD Ue er a EM. 


O<n<q O<n<q 


Letting g — © proves that Lane, 2"r—-sn << LM. On the other hand, (7.203) 
implies 9;(s,t) = limg—+co Gx(q), j(5, 0). Together with (7.207) and (7.208), this 
proves that 


Vs,teET , vj (sth <1, 
and for each n, 


pss o;,6,0) 2) = N,'. o 


Proof of Lemma 7.10.6 Copy the proof of Corollary 7.9.2 verbatim. Oo 
Before we prove Lemma 7.10.7 we need another simple result. 


Lemma 7.10.8 Consider a decreasing sequence of translation-invariant distances 
(dx)g>1 on T. Assume y2(T,d1) < oo and that for each s € T we have 
limg— 00 de (s, 0) = 0. Then limg-o0 y2(T, dx) = 0. 


Proof Givene > 0, since limg_.o9 dg (s, 0) = 0 foreach s € T, we have T = Ux By 
where By = {s € T; Vn > k,dyn(s,0) < e}. Thus for k large enough we have 
(By) > 1/2. Corollary 7.1.4 and Lemma 7.1.6 prove then that A(T, dx) < 4e. We 
have shown that limg_,.9 A(T, dx) = 0. 

Next, according to (7.4) we can find numbers e€, with w({s; d)(s,0) < e,}) = 
N7! and eer 2/26, < o. Let €),~, = min(€n, A(T, dx)). Then since dk < dj 
we have {s; di(s,0) < en} C {s; dx(s,0) < €n,x} so that this latter set has measure 
> i and by (7.4) again we have y2(T, dx) < LO. 2"/7€, x. The right-hand 
side goes to 0 as k — oo by dominated convergence. Oo 
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Proof of Lemma 7.10.7 For each £ = 1, 2, 3, we will prove that the series )7;., Ze 
converges a.s. For £ = | this is obvious since by (7.181) a.s. only finitely many of 
the terms are £ 0. For € = 2 and ¢ = 3, we will deduce this from Lemma 7.10.1. 
For € = 2 this follows from the fact that || 7,<;<, €:Z7l| < So;.;|Z?(O)| since 
|Z; (t)| = |Z;(O)| because Z; € CG. So let us turn to the hard case £ = 3. Fork > 1 
consider the distance dg on T defined by dk(s,t)?> = js, EIZ}(s) — Z7@)/. 
By the version of (7.33) proposed in Exercise 7.5.15 we have E|| }°,-;<, €iZill < 
Ly»(T, dx), so that using Markov’s inequality, to obtain (7.196) it suffices to prove 
that limg_s oo 72(T, dx) = 0. But this follows from Lemma 7.10.8 since by (7.183) 
we have y2(T, di) < ~. oO 


7.11 Further Proofs 


The proof we gave of Proposition 7.5.13 uses p-stable r.v.s. It is a matter of taste, but 
for the author this feels like an unnatural trick. Our first goal in the present section 
is to provide a proof of Proposition 7.11.3, an extension of Proposition 7.5.13 which 
does not use this trick and which brings forward the combinatorics of the situation. 
Finally we will prove Theorem 7.5.18. 


7.11.1 Alternate Proof of Proposition 7.5.13 


Consider 1 < p < 2. Consider numbers a; and characters x;, and we recall 
the distance d, of (7.84). We assume that the independent symmetric rv.s 6; 
satisfy (7.91), i.e., P(\@;| => uw) = 1/(Cu?) for u => C, and we set 


gi(s,t) = Y > Elriai6i(xi(s) — i)? AL. (7.209) 


Proposition 7.11.1 Consider a sequence (jn)n>0 and assume that 


VWs, ET, vp(s,t) <1, (7.210) 
Vn >1, ws eT ; gj, (s,0) <2") >N,1. (7.211) 
Then 
¥q(T,dp) < Ky2(T, do) + K S02" (7.313) 
n>0 


where q is the conjugate exponent of p and where dz is the distance given by (7.84) 
for p = 2. Here K depends on p, r, and C. 
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Proof Combining (7.5) and (7.6), we obtain 


> 2/7 6,(T, do) < Kyx(T, da) . 


n>0 
Let us set a, = 2”/7e, (T, dz). The first step is to reduce to the case where 
Wn >0, a, <2", (7.213) 


The purpose of this condition is certainly not obvious now and will become apparent 
only later in the proof. To obtain this condition, we construct a sequence (j/) as 
follows. For each n > 0, if 2"r~/" > ay we set j/ = jn. Otherwise, we define j’ 
as the largest integer such that 2"-—Jn > dy. Thus j/ < jy and 2" — Jn < ray. We 
then have 


Wn >0, dy <2"r7sn (7.214) 


and since est Qn < Ky(T, do) this yields 


Soars < So 2" + Kryp(T, d2) . (7.215) 


n>0 n=0 


Since 9; (s, t) is increasing with j, (7.210) and (7.211) hold for j/ instead of j,. That 
is, replacing the sequence (j,) by the sequence (j,), we may assume that (7.213) 
holds. 

The main argument starts now. For n > O we construct sets B,. This idea is 
that these sets are of rather large measure while being small for both d and 9;,, 
(following the philosophy of Theorem 4.5.13). We will then show that these sets 
are also small for d, and this will yield (7.212). We choose Bo = T. By (7.211) 
forn > 1, the set A, := {s € T ; gj,(s,0) < 2”} satisfies w(An) = 1/Nn. 
Furthermore we can cover T by Ny, balls (Cj) j<n, of radius < 2e,(T, dz). The sets 
An MC; for j < Np cover An, so that w(An) < dicen, (A, 1 C;). Thus one of 
the sets A, 1 C; (call it B,) is such that w(By) => “(An)/Nn = 1/N?2 = 1/Nns1- 
Since B, C Cj we have 


A(Bn, dz) < 4en(T, dz) . (7.216) 
Our goal next is to prove that for n > 0 we have 
S,t € By => dp(s,t) < K2"/Pr-* , (7.217) 
Since g; is the square of a distance, and since @;,(s,0) < 2” fors € An, we have 


$,t € Bn > Yj, (5,0) < 2(y;,(5,0) +9},0,1)) <2". (7.218) 
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Next, for any number b with |b| < 1/C, using (7.91) we have 
E|ba;/? A 1 > P(\6;| = 1/|b|) = |b1?/K . (7.219) 


Thus for |b] < 1/C we have |b|? < KE|b6;|7\1. Consequently, since |b|? < K|b|? 
for |b| > 1/C we have 


|b|? < KE|b6,|\> \1+ K|b|? . 
Using this for b = rh ai(xi (s) — x;(t)) and summing over i we get 
rPind,(s,t)? < Kgj,(s,t) + Kr*"dy(s, t)? . (7.220) 


Using (7.218), and recalling that by (7.216) we have da(s, t) < 4e,(T, dz), we have 
proved that 


s,t € B, > r?nd,(s, t)? < K2" + Kr™"eq(T, do)” < K2", 


where we have used in the last inequality that 2”/*e,(T, d2) < 2"r~/" by (7.213), 
ie., ren (T, dz) < 2"/*. We have proved (7.217). 

To finish the proof, using the translation invariance of d, and yj, it is then true 
from (7.217) that forn > 1 


w({s €T ; dp(s,0) < K2"/Pr-*}) > (Bn) = Ni - 


Then (7.5) and the definition (7.3) of €, imply that én+1(T, dp) < K2"/Pr—", Since 
eo(T, dy) < A(T, dp) < Kr-# by (7.217) used for n = 0, we then obtain (7.212) 
(using (7.6) in the first inequality): 


Vl d)ySk Y Pet dex ye, 4 
n>0 n>0 
Corollary 7.11.2 Under the conditions of Proposition 7.11.1 we have 
Yg(T, dp) < K Yl 2" KS ail yey) eyiop - (7.221) 


n>0 i 


Proof The idea is to use Corollary 7.8.12 to control the term y2(T, dz) of (7.212). 
First 


Elr/a;6;(xi(s) — x1) [7A 1 > PUAl = DiriaiGals)— xO) PAL. — (7.222) 
Let us set Wo = max; P(|6;| > 1)~!, so that by (7.91) we have Wo < K 


where K depends on C only. Let us set wj(s,t) = )); Irfai(xi(s) — x1)? A 1. 
Thus, recalling (7.209), it follows from (7.222) that ;(s,t) < Wog,(s, f). Setting 
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Dn = {s € T; wj(s, 0) < Wo2”} it follows from (7.210) and (7.211) that Do = T 
and u(D,) => N, ' forn > 1. We appeal to Corollary 7.8.12 to obtain that 
y2(T, d2) < K Din s0 rn 4 KY, ai|1)14,)>,-io))- Combining this with (7.212) 
implies (7.221). oO 


We are now ready to prove the following generalization of Proposition 7.5.13, of 
independent interest: 


Proposition 7.11.3 Consider 1 < p < 2.*° Consider independent symmetric rv.s 
6; which satisfy (7.91). Then there is a constant a depending only on C such that 
for any numbers a; and characters x;, we have 


P(| > aiOixi|] = M) <a = q(T, dp) < KM, (7.223) 


l 


where dp is the distance (7.84) and q is the conjugate of p. 


Proof Define as usual g;(s,t) = ; Elr/aj0;(xils) — xi (tf)? A 1. According 
to Theorem 7.5.1 if @ is small enough we can find numbers j, € Z such that 
Do = T, U(Dn) = N,! forn > 1 and >.) 2"r-™ < KM, where D, = {s € 
T; g;,(8,0) < 2”}. The conditions (7.210) and (7.211) of Proposition 7.11.1 are 
then satisfied, so that (7.221) of Corollary 7.11.2 holds, and this inequality implies 


¥q(T.dp) < KM +K ) \ajlNya js io) - (7.224) 
i 


To control the last term, we will prove that 


card{i; |aj| > r— Joy < Kj; max|aj|< KM, (7.225) 
L 


which will end the proof. We appeal to Lemma 7.8.19: (7.160) implies 
>; Er’ai6;|? A 1) < 2. Since E(\ra;6;|? A 1) > 1/K for |a;| > r7/” this 
proves the first part of (7.225). Consider now a certain index ig. We are going 
to prove that if a is small enough then |a;,| < M/C, concluding the proof. Let 
ni = —1ifi ¥ io and nj, = | so that the sequence (7;6;) has the same distribution 
as the sequence (6;) and thus when P({ ; aji0;| < M) > 1 —a we also have 
P(| >0; niai6i| < M) = 1 — a. Since 2|aigGig| < | 0; 19:| + | 0; niai9;| we then 
have P(|aj) ig) => M) < P(| >>; ai9il => M) + P(| >>; niaiO;| > M) < 2a. On the 
other hand from (7.91) we have P(|@;,| > C) = ior. Assuming that we have 
chosen @ small enough that 2a < 1/C?*!, we then conclude that C < M/|aiy|. O 


46 We leave as a challenge to the reader to consider the case p = 2. In that case it suffices to assume 
that for a certain B > 0 we have P(|6;| => 6) = B. 
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7.11.2 Proof of Theorem 7.5.18 


Proof of Theorem 7.5.18 (b) The condition yg (7, dy») is stronger than the sufficient 
condition of Theorem 7.5.16. This is shown by Proposition 7.5.9.7 oO 


Proof of Theorem 7.5.18 (c) Combine Proposition 7.11.3, Lemma 7.10.4, and the 
next lemma. oO 


Lemma 7.11.4 Consider an increasing sequence (dx) of translation-invariant 
(quasi) distances on T. Assume that the limiting distance d(s, t) = limg_+o0 dx(s, t) 
is finite. Then yq(T, d) < K sup, ¥q(T, dx). 


Proof Combining (7.5) and (7.6), we obtain that for any translation-invariant 
distance 5, we have Y~,,..9 2”"/e,(T, 5) < yq(T, 8) < K Yiy39 2"/4en(T, 8), so that 
it suffices to prove that e, (T, d) < 2 limp. €n (T, dg). According to Lemma 2.9.3 
(a) given a < e,(T,d) we can find points (tg)g<y, such that d(te, ty) > a 
for € # ¢’. Then for k large enough we also have dx(te, ty) > a and thus 
en(T, dy) > a/2. oO 


Exercise 7.11.5 Complete the proof of Theorem 7.5.18 (a) using similar but much 
easier arguments. 


7.12 Explicit Computations 


In this section we give some examples of concrete results that follow from the 
abstract theorems that we stated. The link between the abstract theorems and 
the classical results of Paley and Zygmund and Salem and Zygmund has been 
thoroughly investigated by Marcus and Pisier [61], and there is no point reproducing 
it here. Rather, we develop a specific direction that definitely goes beyond these 
results. It was initiated in [118] and generalized in [33]. There is a seemingly 
infinite number of variations on the same theme. The one variation we present has no 
specific importance but illustrates how precisely these matters are now understood; 
see Theorem 7.12.5 as a vivid example. 

We shall consider only questions of convergence. We use the notation of 
Exercise 7.3.9, so that T is the group of complex numbers of modulus 1, and for 
te T, x;(t) = t! is the i-th power of t. We consider independent r.v.s (X;);>1 and 
complex numbers (q;);>1, and we are interested in the case where*® 


Z(t) = a Xixi(t) = aj Xit! (7.226) 


47 Or, more accurately, by the version of the proposition when finite sums are replaced by series. 
48 The reason behind our formulation is that soon the r.v.s (X;) will be assumed to be i.i.d. 
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We make the following assumption: 


J E(la;Xj|? A 1) < 00. (7:227) 


i=l 


To study the convergence of the series, without loss of generality, we assume that 
a; # 0 for each i. 


Theorem 7.12.1 Under the previous conditions, for each n > O there exists a 
number hy, such that 


> (ued A 1) —27, (7.228) 


and the series )°;., aig: Xi xi converges uniformly a.s. whenever 


yo 2 an < 00. (7.229) 
n>0 
As a consequence we obtain the following: 


Corollary 7.12.2 If 


eal > lai?) ee, (7.230) 


n>0 i>Ny 


then the series )°;. | ai€i Xi converges uniformly a.s. 


Proof Since |¢;| = 1, (7.228) holds for Ae ee ie lai|*, and under (7.230) 
such a sequence satisfies (7.229). oO 


Exercise 7.12.3. Compare this result with (7.38). 


Proof of Theorem 7.12.1 First we observe from (7.227) that for any N the function 
Wy) = Vis ny E(\ya;X;|7 A 1) is continuous and satisfies lim,.9 ¥(y) = 0 and 
limy-s 90 W(y) = oo, and this proves the existence of A,. The proof will then rely 
on Theorem 7.5.16. For a change, throughout this section, we use the value r = 2. 
Let us consider s € T, and let us assume that for some integer n > 1, we have 


sad = 


< (7.231) 
Nn+1 


Let us observe the following inequality: Fori > 1, 


sas ie— 11. (7.232) 
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We then write, for any integer j € Z, using also that |s’ — 1| < |s|/ + 1 < 2 in the 
last line, 


Y 5 E24 (Zi(s) — Zi)? A 1) = DY E(2/a;Xi(s' — YP? A 1) 


i=l i>1 


< DDE Elia Xi(s— DP a1) 


O<m<n Nn <i<Nn+1 


+ 0 E(2/*1a:X;? a1). (7.233) 
i=Nn 


From (7.228) we observe that 


; x, |2 
mn2ith <1 D> E(2/+a:XiP AL) < o e(“ i Al) =2". (7.234) 


i>N, i=Nn 7 


Also, fori < Nm+4i and m <_ n, (7.231) implies i]s — 1] < NMm4i/Nn+1 < 
Nn/Nn+i = 1/Nn. Consequently, it follows from (7.228) again that 


; e |a; Xj |? 
hm2) << Nn => Y> E(2iaiXi(s- DP AD < DO E( =A 1) =2", 
Nn <i<Nm41 i>Nm m 

(7.235) 


Consider the largest integer j, which satisfies both An2iet! < 1 and Am2/" < Ny 
for each m < n. Using (7.233), (7.234), and (7.235), we then get 


Dd Eda" (Zils) - ZO)P ADs DY +2" <2", (7.236) 


i>1 O0<m<n 


Moreover the definition of j, shows that either 2, Qint2 > |] (in which case 2-4" < 
4dn) or else Ap 2/"*+! > N, for some m <n (in which case 27" < 2im/Nn), SO 
that 


. Xr 
2 In < An + o) » at ‘ (7.237) 


O0<m<n 


Let us denote by U,, the set of points s that satisfy (7.231). Thus U, is an 
“interval” on the unit circle, the set of points of the type exp(ix) where |x| < Ty, 
where 0 < t < z/2 satisfies 2 sin(t,/2) = 1/Nn41. Denoting by yw the Haar 
measure of T, forn > 1 we have w(U;,) = T,/7, so that we certainly have 
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M(Un) = 1/Nn+2. Recalling (7.236) we have proved that 


1 
Nn+2 


’ 


u({seTs SLEU2*(Zi(s) — ZiOyP A <2"*1}) > 


i>1 
while (7.229) and (7.237) imply that ae 2"—Jn < oo. Using Theorem 7.5.16 this 
completes the proof. 7 o 


The following provides a converse of Theorem 7.12.1 under a mild regularity 
condition: 


Theorem 7.12.4 Under the conditions of Theorem 7.12.1, assume moreover that 
the sequence (X;) is i.i.d. and that for a certain number C > 0, one has 


k<m<2k= |ax| < Clam| . (7.238) 


Then (7.229) holds whenever the series 2s ajéi Xi Xi converges uniformly a.s. 


Proof We use Theorem 7.5.16 to obtain a sequence (j,) with )>,.92”7/" < oo 


and 


n>0 


' 1 
vn=1, w({seTs \LE(2*(Z(s) - ZO AD <2"})>—. 
i>1 Nn 
(7.239) 
We will prove that (7.239) implies that 
Ant3 <= LC?2-* , (7.240) 


completing the proof. The set of s € T such that |s — 1| < 1/(2N,,) is of measure 
1/z Ny, So it cannot contain the set considered in (7.239). Thus we can finds € T 
with 


Is — 1] > 


7.241 
= oN, ( ) 


and 


> E(|2"a;Xi(s' —1I)|2 A 1) <2", (7.242) 


i=l 


where we have been also using that Z;(s) = a;X;s'. Now let J = {i > 1; |s! -1|=> 
1/4}, so that (7.242) yields 


> E (2-24; Xi? A 1) < 2", (7.243) 


ieJ 
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and the idea is to compare with (7.228) to bound A, from above. To implement the 
idea, we will show that there are many values of i > 22"+3 in J. Indeed we have 


2P 


: ps -—1 
s—1 


2P <i<2Pt! 


so that using (7.241) 


QP <j<2P+! 


Now we have ) 5p; -9p+1 1 = 2”, so that 


| os (i — |= 2? 4, 


2P<i<2ptl 


If p > 2” + 3 we have 2? > 8N,, and thus 


aP-l<2P—4N,< > |s'-1]. (7.244) 
2P<i<2Pt! 
Let now 
Ip ={i; QF 27-2 oP) 2s = 1 = 1/4}, (7.245) 


Since there are 2? terms on the right-hand side of (7.244), each of which is < 2, it 
follows that 


1 
ge ps jis 1< 2eard Ip + 2?7 : 


2P<i<2Pt! 
so that 
card I, >2?-3. (7.246) 


From (7.238) for 2? < i < 2?+!, we have |a;| > |a2»|/C and combining 
with (7.246), 


Qin— 


in —2 
a2p X2p 
Cc 


2P3E(| “Al) < > E(|2"ajXi(s' — |? Al). (7.247) 


2P <i<2Pt! 
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Using (7.238) again, for 2P-! <7 < 2?, we have |ayp| > |a;|/C and thus 
Qin-2 


2-3 3 E(| G2 aj Xj 


2p-l<j<2P 


2,in —2 
a2p Xp 


a 1) < 2° (| a, 1) (7.248) 


Combining with (7.247), summing over p > 2” + 3 and combining with (7.242) 
yields 


3 QJjn—2 2 o 
a? E(| ar aiXil 1) <2", (7.249) 
i>22"+2 
and, in particular, 
Qin—2 2 
> E(|=- aX: A 1) < girs . 
i> Ny13 
By definition of A, this implies 
Qjn—2 2 1 
c= An+3 
This proves (7.240). oO 


To give a still more explicit example, we mention the following: 


Theorem 7.12.5 If (X;) denotes an i.i.d. sequence distributed like X, the series 
ae ei X; Xi converges uniformly a.s. if and only if 


E|X|loglog(|X| +3) < oo. (7.250) 


Proof Since the sequence ax = 1/k satisfies (7.238), it suffices from Theo- 
rems 7.12.1 and 7.12.4 to prove that (7.250) is equivalent to (7.229). The proof 
uses standard methods that are not related to the ideas of this work. It can be found 
in Lemma 2.1 of [118]. oO 


7.13  Vector-Valued Series: A Theorem of Fernique 


This section illustrates X. Fernique’s decisive contributions to the ideas presented in 
this volume. It is a side story, which can be skipped at first reading. We assume 
that the reader has some very basic knowledge about Banach spaces, such as 
formula (19.1). 


7.13 Vector-Valued Series: A Theorem of Fernique 281 


We consider a compact Abelian group 7 and a complex Banach space E (nothing 
is lost by assuming that E is finite-dimensional). We denote by || - || the norm 
of E. Consider (finitely many) vectors a; of E and characters x; on T. Consider 
independent standard Gaussian r.v.s g;. We are interested in the sum >; igi xilt) 
and more specifically in estimating the quantity 


E sup | do aisixi®| - (7.251) 
te ; 


We denote by x* the generic element of the dual E* of E. 
Theorem 7.13.1 ({[33]) We have 


2 | Daisixi| < Li (E| dais | + hepa (ai) gi xi (t))) . 
bs teT 
(7.252) 


Here ||x*|| denotes the (dual) norm of x*. It is obvious that 


E| >< aigil| < E sup | So aigixi 
i FS i 


sup Esup | So x*(ai) 81 Xi (t)| < = ore | 2 ai8i (t) le 
Ix*|<1 teT “G 

Thus the bound (7.252) is of the correct order. Furthermore the quantities in the 

left-hand side are simpler than the right-hand side. 


Proof The overall strategy of proof is the obvious one. We know how to estimate 
the supremum of a Gaussian process from the value of the functional y2(T, d), and 
we have to relate the distances corresponding to the different Gaussian processes 
occurring in (7.252). 

Let us denote by E¥ the unit ball of E*. For (x*,t) € Ej x T we set X,*; = 
>) ** (ai) gi xi (0), so that 


Esup |S agixiO]|=E sup [Xx - (7.253) 
teT i (x* the ET xT 


The canonical distance on E ; x T associated with the Gaussian process (Xx*,+) 
is given by 


d((x*,s),O*, DY = Do lx*@) xis) — YG)? . (7.254) 
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Denoting by LS the right-hand side of (7.252), the goal is to prove that 
yo(E} x T,d) < LS. (7.255) 


On Ej we consider the distance 6 given by 


5(x*, yw")? = Yo [x*(ai) -— y*@? - (7.256) 


i 
Using Lemma 7.2.4 we obtain 


y2(Et, 6) < LE sup |Xx*o| = LE| Yo aigi | 228 (7.257) 
x*eEY . 


L 


Given z* € E¥ we consider the following distance on T: 


dz«(s,t)? = >. Iz*(ai)xi(s) — 2 (ai) xi? , (7.258) 


so that by Lemma 7.2.4 again we obtain 


Vz* € Et , y2(T, d-*) < Esup| S> *(aigixi()| <LsS. (7.259) 
teT P 


Since the distance d_» is translation-invariant, combining (7.4) and (7.5) yields 


me 27/26, (T, dex) < LS. (7.260) 


n>0 


The next task is to relate the distance d with the distances 6 and d-*. First, since 
|xi(t)| = 1, we have 


d((x*,t), (*,1)) =6Q*, y*), (7.261) 
and also 

d((x*, 5), (X*,t)) = dyx(s,t) . (7.262) 
Given x*, y*, z* € E*, ands, t € T, we then have 


a(x", 3), OD) Sax"), 2) Fae ea dG" 0.) 
= 8(x*, 2*) + dy«(s, t) + 8(y*, z*). (7.263) 
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We first note that this implies 


A(E* x T,d) < 2A(E*,5) + sup A(T, dz») < LS, (7.264) 


z*e Et 
using (7.257) and (7.259) in the last inequality. In the remainder of the proof, we 
deduce (7.255) from (7.257), (7.263), and (7.260), which finishes the proof using 


Theorem 2.7.11. Let us consider an admissible sequence (A,,) of partitions of Ef 
such that 


sup SAA eS, <= LS. (7.265) 


x*€ET n>0 
Given A € A, let us select a point z*(n, A) € A for which 
en(T, dex(n,ay) < 2inflen(T, dex) ; z* € A}. (7.266) 


We then construct a partition C4, of T into N, sets, each of which are of diameter 
< 4e,(T, dz*(n,a)) for the distance d_«(n,4). We consider the partition Bi of EF xT 
in sets of the type A x C where A € A, and C € C4». Its cardinality is < N? — 
Nn+1- Let us define 6, as the partition of Ei x T generated by Bi. ..., By, so 
that as usual the sequence (6,,) increases and card B, < Nny+2. Consider a point 
(x*,t) € Ej x T. Then, denoting by B,(x*, t) the set of B, which contains this 
point, we have 


By(x*,t) CAxC, 


where A = A,,(x*) and C is the element of the partition C4, that contains t. For 
any z*, (7.263) implies 


A(Bn(x*, t), d) < L(A(An(x*), 6) + A(C, dz)) . (7.267) 


Using the definition of the partition C4,, in the first inequality and the choice of 
z*(n, A) in the second one, we obtain 


A(C, dz*(n,A)) S 4€n(T, dein, ay) < 8en(T, dx*) , 
and therefore using (7.267) for z* = z*(n, A) we get 

A(Bn(x*, t),d) < L(A(An(x*), 6) + en(T, dx*)) . 
It then follows from (7.260) and (7.265) that 


SOP AG t),d)<LS, 


n>0 
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so that combining with Lemma 2.9.10 for t = 2 and using also (7.264) 
yields (7.255) and finishes the proof. Oo 


Does (7.252) remain true when the Gaussian r.v.s are replaced by Bernoulli r.v.s? 
That is, is it true that 


E sup | Ye eiaixi(s)|| < LE| Yo eiai | +L sup Esup | So eix*(ai)xi(0)| ? 
t : ; x*ek* teT . 
| (7.268) 


It is while pondering this question that the author formulated the Bernoulli 
conjecture. 


Exercise 7.13.2 Use Theorem 6.2.8 to prove (7.268). 
If you find this exercise too difficult, its solution can be found in [53]. 
Key Ideas to Remember 


¢ For a translation-invariant distance on a compact group 7, the entropy numbers 
are basically determined by the Haar measure of the balls of a given radius, 
irrelevant of their shape. This is a tremendous simplification. The generic 
chaining is not needed and entropy number suffices. 

¢ Consider characters x; (none of them constant), and consider (finitely many) 
complex numbers a;. Consider the distance on T given by d(s,t)?> = 
>); lai |xi(s) — xi(t)|*. Denote by ¢; independent signs and by g; independent 
standard Gaussian r.v.s. Then the contraction principle ensures that 


E sup | Y- aiei xi(2)| < LEsup | Y-aigixiO| . 
teT ; teT ; 


The Marcus-Pisier theorem asserts that 


Esup| )*aigixi(t)| < Ly2(T,d) < LE sup | ¥> aieixi(0)| . 
teT : teT ; 


¢ From now on é; denote independent symmetric r.v.s. One has the general bound 


Esup| > &xi®| <Ly(Z.d), (7.269) 
teT j 


where now the distance d is given by d(s, t)? = », Elgi 7lxi(s) — xX OI*. 

¢ When the variables & are not square-integrable, the main problem in controlling 
the quantity E sup,e7 | >>; & xi (t)| is to control the typical value of y2(T, dw) 
where d,, is the random distance given by d,(s, t)? = >; |&|7|xi(s) — x1)’. 
The characteristics of T which make such a control possible cannot apparently 
be described using a single distance, but can be described using a “family of 
distances”. This feature will occur in many problems we will study later. 
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¢ Through rather general principles (which will receive later a considerable 
generalization), one proves that the control of the typical value of y2(T, dw) 
implies a suitable smallness of T (as appropriately measured through a certain 
family of distances). Thus a control from above of E sup; er | >_; & xi(t)| implies 
a suitable “smallness” of T. 

¢ Besides the bound (7.269), one has the trivial bound 


Esup| > éxi(0| <> Elél. (7.270) 
te i i 


In a precise way, as stated in Theorem 7.5.14, every situation is a mixture 
of (7.269) and (7.270): we can find a decomposition & = &/ + &’ such that 
» E\é/| < LE sup,er | a Ei xi(t)| and y2(T,d) < LE sup,er | a &: xi (| 
where the distance d is given by d(s, t)? = 7; ElE//?lxi(s) — x1). 

¢ Consider independent symmetric r.v.s. (&;);>1. The historically important prob- 
lem of the uniform a.s. convergence of random Fourier series of the type 
Yo is1 & xXi(t) is now completely understood, and the solution is unexpectedly 
simple. There are three rather different cases where this convergence holds. There 
is the case }°,., P(|&| 4 0) < oo, the case )°;., El&| < 00 and the case 
y2(T, d) < 00 where the distance d is defined by d(s, t)? = )7;5, Eléi|?|xi(s)— 
xi (t)|?. Conversely, every case where this convergence holds is in a precise sense 
a mixture of the previous three cases; see Theorem 7.5.17. 


7.14 Notes and Comments 


The characterization of convergence of random Fourier series is almost achieved 
in the paper [108]. This paper uses the most natural approach to upper bounds: 
chaining arguments for Bernoulli processes. The paper [108] still required some 
weak but unnecessary tail conditions, because the chaining was not organized in 
an optimal way. It is only while writing [132] that the author finally succeeded in 
removing all extraneous conditions, by organizing the chaining as is now done in 
Theorem 9.2.1. Random Fourier series have the particularity that the proof of upper 
bounds is much more difficult than the proof of lower bounds, while often it is the 
opposite which happens. The simpler arguments we present now were discovered 
much later. 

I have given the “magic proof” of Proposition 7.5.13 as an homage to the paper 
[62] of Marcus and Pisier, which had a considerable influence on my own research. 
However, now that I understand things better, I feel that p-stable r.v.s are not 
intrinsically related to this problem and that it is simply a coincidence that they 
happen to have a tail in u~?, so that one could argue that bringing them to bear on 
this question is a “trick” rather than a method and is somewhat misleading. 
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Consider independent symmetric r.v.s nj. We have (basically) controlled 
Esup,er | >0; ini xi(t)| using characteristics of T which involve a family of 
distances. One could ask for which r.v.s 7; these characteristics can be expressed 
in function of a single distance. While we have not tried to prove this, it seems 
that, besides the case E|n;|* < 00, the only case is the case of the tails in u~? (the 
characteristics are then expressed using the distance dp; see (7.90) and (7.92)). This 
is one reason why this case has some importance. 

I will end by a personal touch. I have been thinking about random Fourier series 
for over 35 years, and it is quite amazing that I could still make progress after all 
these efforts. 


Chapter 8 ®) 
Partitioning Scheme and Families crests 
of Distances 


In the previous chapter, in the setting of random Fourier series, we introduced the 
idea that it does not suffice to use one single distance to control certain stochastic 
processes, but that a “whole family of distances is required”; see (7.63). The 
situation was however made easier by translation invariance, in the sense that 
covering numbers provide an accurate description of the “size” of the space with 
respect to these distances. This will no longer be the case in general. For an accurate 
description, we need to generalize to tools of Chap. 2 to “families of distances”. In 
Sect. 8.1 we generalize to the setting of “families of distances” the first partitioning 
scheme of Sect. 2.9, and the reader needs first to master that result. In Sect. 8.3 we 
will apply this tool to the study of “canonical processes”’. In order to study canonical 
processes, we first need precise estimates on the tails of certain r.v.s, and these are 
the goal of Sect. 8.2. The present section can be seen as a far-reaching generalization 
of the majorizing measure theorem 2.10.1, but none of the further material depends 
on it. 


8.1 The Partitioning Scheme 


We consider a family of maps (¢;) jez, with the following properties: 
gj: T x T > Rt U {oo}, gj41 > 9) 29, Oj,1) = G;(t,5). (8.1) 


Such maps were of fundamental use in the previous chapter; see (7.63). In many of 
our applications, the maps ¢; will be squares of distances and will satisfy a version 
of the triangle inequality. We however do not assume that this is the case: in the 
setting of Sect. 8.3 such an inequality is not satisfied. 
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We define 
Bi(t,c)={s eT; gj(t,s)<c}. 


We recall that a functional F ona set T is a non-decreasing map from the subsets 
of T to R*. We consider functionals F,,; on T forn > 0, j € Z. We assume 


Faoigg S Fas Fajti S Fanj - (8.2) 


We will assume that the functionals F,,, ; satisfy a “growth condition” very similar 
in spirit to Definition 2.8.3. This condition involves as main parameter an integer 
k > 5. Wesetr = QK-3 so that r > 4. The role of r is as in (2.76), the larger r, the 
weaker the growth condition.! 


Definition 8.1.1 We say that the functionals F,,,; satisfy the growth condition (for 
r) if the following occurs. Consider any j € Z, any n > 1 and m = N,. Consider 
any sets (H¢)1<e<m that are separated in the following sense: There exist points 


u,ti,..-,tm in T for which He C Bj+2(te, 2"**) and 
Vee sm, 240), pit, te) = 2", (8.3) 
Vi<m, te € Bj(u, 2"). (8.4) 
Then 
Frai( U Hy) = 2'r-I-* + min Fg, HO - (8.5) 
<m 


We have not made assumptions on how 9; relates to gj+1; but we have little 
chance to prove (8.5) unless Bj+2(te, 2+) is quite smaller than Byii (te, gntly, 

As we already stressed, the best way to illustrate a statement about families of 
distances is to carry out the case where 


gj(s,t) = r7/d(s, t)” (8.6) 


for a distance d on T. Denoting by B(f, b) the ball for d of center ¢ and radius b, we 
then have 


B,(t,c) = B(t,r fe). 


'The reason why we take r of the type r = 2*~3 for an integer « is purely for technical 
convenience. 
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Thus in (8.3) we require that 
Ve, <m,l£2L', d(te,ty) = 2YPr-s! cma. (8.7) 
On the other hand, 
Bj42 (te, 2"**) = Bite, 2" I) = Blte, na) , 
where n := 2—)/?/> = 2/,/r. Thus the condition Hg C Bj+2(te, 2"**) means 
that He C B(te, na). As r gets larger, n gets smaller, and recalling (8.7), this means 


that the sets He become better separated, in the sense that they become smaller 
compared to their mutual distances. Also, (8.5) reads as 


Fr.i( U Hv) > gm-D/2q 4 min Fy+1,j+1(He) , 


L<m 
which strongly resembles (2.77). Thus, we should think of the term r~/~! in the 


right-hand side of (8.5) as a normalization factor and the condition (8.5) as being 
uniform over /. 


Theorem 8.1.2. Assume that the functionals Fy,; are as above and in particular 
satisfy the growth condition of Definition 8.1.1 and that, for some jo € Z, we have 


Vs,teT, gp(s,t) <1. (8.8) 
Assume also that” 
Vs,teT, Vi eZ, ojris,t) = rgj(s,t). (8.9) 


Then there exists an admissible sequence (An) and for each A € Ap an integer 
Jn(A) € Zand a point th,4 € T such that 


Ae An, CE An-1, ACC => jn-1(C) ¥ in(A) ¥ jn-1(CE) +1 (8.10) 


wee T, >) 2%r AO < Lr Fj (T) +r) (8.11) 
n>0 
Vn >0, VA € An, AC Bj,(ay (tna, 2”) - (8.12) 


? In [132] the present theorem is stated without assuming this condition but the proof given there is 
in error. The condition (8.9) is a very mild extra hypothesis, since in the separation condition, we 
have already implicitly assumed that Bj+2(te, Qtek) = Bj+2(te, (4r)2"+!) is quite smaller than 
Bjai(te, 2"). 
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Let us stress that we do not require that tf, 4 € A. Let us also note the new feature 
of (8.12) compared to our previous constructions. We do not control the size of the 
elements A of A, by requiring that they are of “small diameter” (in the sense of 
controlling 9j,(4)(s,t) from above for all s,t € A), but by the condition (8.12), 
requiring that they are contained “in a small ball’. This twist is required due to the 
possible failure of (any form of) the triangle inequality for the “distance” 9, (4). 

To illustrate this result, we again carry out the case (8.6) (although in that case we 
do not have problems with the triangle inequality). Then (8.8) means that A(T, d) < 
r—/0, while (8.12) implies A(A, d) < 2r~/"4)2”/?, Moreover (8.11) implies 


VteT, >> 2" A(An(t), d) < L(r Fo, j)(T) + ra ioy , 


n>0 
Taking for jo the largest integer such that A(T, d) < r~/°, we get 
yw(T, d) < Lr(Fo,(T) + A(T, d)) , 


which is very similar to Theorem 2.9.1. 
The proof of Theorem 8.1.2 relies on the following, where again the functionals 
are as above: 


Lemma 8.1.3 Consider a set C C T, and assume that for some integers j € Zand 
n > 1 and for some u € T, we have C C B;(u, 2”). Then we can find a partition 
(Ag)e<m of C, where m = N,, such that for each € < m we have either 


At, € C, Ag C Byj4i(te, 2"*1) (8.13) 
or else 


arty -I) + sup Fn41,j+1(AeM By+2(s, 2"**)) < Fyj(C). (8.14) 


seAg 


Proof Consider the set 
D:i={seC; Fle F 1 + Fuss ps (CO B42, 2") > Fj (O}. 


As in Lemma 2.9.4, it follows from (8.5) that D can be covered by < m balls of the 
type Bj+1 (te, 2"+!). Thus we can partition D in < m sets Ag which satisfy (8.13): 
Ag C Bj+ilte, getty. te € C. The required partition consists of these sets together 
with the set C \ D, which automatically satisfies (8.14). oO 


Proof of Theorem 8.1.2 Let us repeat that the reader should be comfortable with 
the proof of Theorem 2.9.1 as many features here are nearly identical. To start the 
construction, we define Ap = {T}, jo(T) = jo, and take any point of T for to, Ap. 
To construct A,+; once A, has been constructed, to each element C of A,, we 
apply Lemma 8.1.3 with 7 = j,(C) and u = t,,c to split C into m = N, pieces 
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Aj,.-., Am. (Thus, the sequence (A,,) is admissible since N? < Nn+1.) Let A be 
one of these sets. 

When A satisfies (8.14), we set jn41(A) = j = jn(C) and t41,4 = tn,c SO 
that (8.4) for A follows from the same relation for C. 

When A = A, satisfies (8.13), we define instead jn41(A) = j + 1 and th41,4 = 
te. Thus (8.12) holds for A andn+1. Our construction satisfies the further important 
property that 


A € Anti,C € An, ACC, jnt 1 (A) = jn(C) = tn4t,a = thc (8.15) 

A € Ansi,C € An, A CC, jung (A) = fx(C) +15 tra ec. (8.16) 
Let us now prove that 

Ace Ay, Ce Ann’ >n, ACC, jy (A) > jn(C) > tra EC. (8.17) 


Forn < s <n’ let us denote by As the unique element of A; with A C As, so that 
Ay = Aand A, = C. Let n” be the largest integer with j,”(An”) < jn/(A), so that 
n <n’ <n’. Thos forn’ +1 <k <n’, we have jg(Ag) = jn/(An’). The value of 
Jx(Ax) does not increase over this interval, and as a consequence (8.15), the value 
of t%,4, does not change over this interval, i.e., it holds that ty4 = th +1,Ani gt: 
Furthermore from (8.16), we have th” +1,Anny, © Ayn C An = C, proving (8.17). 

Since (8.10) holds by construction, it remains only to prove (8.11). Let us fix 
once and for all a point t € T, and to lighten notation, let j(m) = j,(An(t)) and 
a(n) = 2"r—-J™, so that we have to bound 20 a(n). Consider the set 


J={U{z>0; f@-N=J@), jf@4+)=s@4+1], 
and let us enumerate J asO = ng <n; < n2..., so that j(mg41) = fe +1 = 
j(ng) + 1. Since a(n + 1) = 2r/™-J@+Da(n), Lemma 2.9.5 used fora = 2 


implies that yeneo a(n) <L ae, a(n) (as in (2.91)). We apply a second time 
Lemma 2.9.5 with ~ = 2 to the sequence (a(n))ney. Defining 


T= {O}U{m, kK>1; VE>1, CK, alne) < a(y)2*“}, 
Lemma 2.9.5 implies a, a(n) <4 ei a(n), and it suffices to bound this latter 


sum. 
Consider then nx € J, so that a(ng41) < 2a(nx) ie. 


Qk+1 pI MK+1) < gnctd 7K) , (8.18) 
Thus n* := ng+1 + 1 satisfies 


J") = jeri +1) = j(mept) +12 jn) +2 (8.19) 
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and 
Ant (t) C Byay(s,2”) , 
where s = fy*,A,.(1) © An, (t) by (8.17). We can also rewrite (8.18) as 


Qi LM) +S (ng) +2 < poet? — gnketKeel (8.20) 


Now, as a consequence of (8.9), we have Bj+1(s,u) C Bj(s,u/r) and thus, using 
also (8.20) in the last inequality, 


Bi gen @s2") C Bigyya@i2" FIMO) € Bigg Zr . 
Since Ay*(t) C An, (t), we then have 

Fong, j (ng)+1(Ane(t)) S Fry, j(np)+1 (Ang (t) 1 Byonyy42(s, 27") (8.21) 
Assuming now k > 1, we have j(mz — 1) = j(ng) so that setting n = ng — 1, we 
have j(n + 1) = j(n). It follows by construction that when we split C = Ajy(t) 


according to Lemma 8.1.3, An,(t) = An+1(¢) is a piece Ag that satisfies (8.14), so 
that in particular 


: — 
—a(n) + sup Fry, jn +1(An, t) 0 Byayy42(s, 2%t*!)) 
ar SEAp, (1) 


< Fry-1,j()(An-10)). (8.22) 
It then follows from (8.21) that 
1 
7 tw) S Fry=1, jing) (An-1) = Fig, jong) +1 (Ane @)) - (8.23) 
Let now f(k) := Fry, j(n (An, (2). Since ng_-1 < ng — 1 we have Ay,-1(t) C 


An,_, (t). Since j (nx) = j (nx—1), we have, using (8.2), 
Frny—1, j (np) (An,-10)) = Sk a 1) : 
Since n* < ny+2 we have Ang (t) C Anz (t). Since ng < ngy2 and j(ng) + 1 = 


< 
J(Mx41) < j(nme4z2) it holds that Fryjati (Ane) = f(k + 2), so that (8.23) 
implies 


1 
Roe Ss Ia I Tike) 
: 
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and the proof follows as usual, by summing this inequality for n, € J to bound 
Snel a(n), using also that j(0) = jo and using also a(0) = r~”°. oO 


8.2 Tail Inequalities 


Consider independent symmetric r.v.s (Y;);>1. Assume that we control the tails 
of each of them. How do we control the tails of a sum Vv i1 aj; Y;? Let us start 
with a particularly instructive case. The following is a simple consequence of 
Lemma 6.4.5: 


Lemma 8.2.1 Consider i.i.d. copies (Yj)i>1 of a symmetric rv. Y which satisfies 
the following condition: 


Vu >0, P(Y| =u) <2exp(—u) . (8.24) 


Then for numbers (t;)j>1 and any u > 0 we have 


P(|[Doae 


2 


> u) < 2exp(— = min (— > eT (8.25) 


Die man K Th 


Exercise 8.2.2, Assuming now that rather than (8.24), we have 
P(|Y| > u) < 2exp(—u?) (8.26) 


for some p > |. Denote by g the conjugate exponent of p. Prove that for p < 2 we 
have 


1 u? uP 
P | 1; =u) <2exp — — min +. ——_,,)) (8.27) 
( d, iti ( K (<a Oost |t;|7)P/4 
whereas if p > 2 
P(|Sv ay] >) <2 en 8.28 
(3a : =u) — exp (— 5 max (——a. is ane)” ee) 


i>1 


In parallel with the way we defined Bernoulli processes, one may now define a 
canonical process based on the r.v.s Y; by 


x, =) ay; (8.29) 


i=l 
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fort € 07. Assuming (8.24), combining (8.25) and Theorem 4.5.13 we obtain 


Esup X; < L(2(T, d2) + W1(T, doo)) - (8.30) 


teT = 


When | < p < 2, under (8.26) one obtains similarly 


Esup X; < L(2(T, d2) + yp(T, dq)) . (8.31) 


teT 


When p > 2, under (8.26), we obtain the bounds E sup,e7 X; < Ky2(T, d2) and 


Esuprer X1 < K Yp(T, dq). One may then interpolate between these bounds to 
obtain 


E sup X, < K inf {y2(T), d2) + ¥p(T2,dq); TCM +h}. (8.32) 


teT 


This is very similar to the bound (6.10) (see (6.9)) on Bernoulli processes.* The 
obvious question is whether the previous bounds can be reversed when (8.26) is 
optimal, say 


P(|Y| > uw) =exp(—u?). (8.33) 


The author proved this in [113]. These results were then generalized by R. Latata 
[48], who considers r.v.s with far more general tail conditions than (8.26). Latata’s 
results are the object of the rest of this chapter. Latata’s work often displays a very 
high level of sophistication, and this is certainly the case here. 

Throughout this section and the next, we consider independent symmetric r.v.s 
(Y;)i>1. We assume that the functions 


Uj(x) = — log P(I¥;| = x) (8.34) 


are convex. In the important special case (8.33), we have U;(x) = x?. Since it is 
only a matter of normalization, we assume that U; (1) = 1. Since U; (0) = 0 we then 
have Uj(1) = | by convexity. 

In the remaining of this section, we provide the proper generalization of (8.27) 
and (8.28). A first idea “is to redefine the function U; as x? for —1 < x < 1”. In 
order to preserve convexity, we consider the function U; (x) (defined on all R) given 
by 


2: 

; f0<|x|/<1 

ties, Se (8.35) 
2U;(\x|) — Lif |x| =1, 


3 Bernoulli process, which can be thought of as the “limiting case p = oo”, motivated the present 
investigation. 
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so that this function is convex. Given u > 0, we define 


Nu(t) = sup | )tiais Y>O;(ai) <u} . (8.36) 


i>1 i=l 
Proposition 8.2.3 [fu > 0, v > 1, therv. X; = ae ti Y; satisfies 
P(X; > LuN,,(t)) < exp(—uv) . (8.37) 


To get a feeling of what happens, let us first carry out the meaning of A/,,(t) in 
simple cases. The simplest case is when U;(x) = x? for all i. It is rather immediate 
then that x? < Uj(x) < 2x? and 


Vu/2\It\2 < Nut) < Vulltlle, (8.38) 


and (8.37) takes the less mysterious form P(X; > Lu./ult||2) < exp(—uv). 

The second simplest example is the case where for all 7 we have Uj(x) = x for 
x > 0. i that case we have |x| < U; (x) = 2|x|—1 < x? for |x| > 1. Thus 
U; (x) < x* and 0; (x) < 2|x| for all x > 0, and hence 


Via? sus Ui(a) <u 


i=l i=l 


and 


y2lail <u> > U;(ai) <u. 


i>1 i=l 


Consequently, we have Ni,(t) > Sulltl|2 and Nu(t) > ulltlloo/2. Moreover, if 
ae Uj (aj) < u, writing bi = Qi lgqa;|>1} and cj = ajlyja;\<1} we have Doi |bi| < 
u (since U;(x) > |x| for |x| > 1) and 7}. c? < u (since Uj(x) > x? for |x| < 1). 
Consequently 


Se tiai = So tibi + be: <ulltlloo + Vulltlle . 


i=l i=l i=l 


and we have shown that 

1 

5 max(u||t loo, Vulltll2) < Nut) < Ulltlloo + Vulltll2) , (8.39) 
and (8.37) means that 


P(X, > Lu(Jullt|l2 + ulltlloo)) < exp(—uv) , 


which is just another way to write (8.25). 
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We start the proof of the tail estimate (8.37) along the standard Cramer-Chernoff 
method: 


P( > t)Y; > u) < inf exp ( —urA+ >. log E expat; vi) ; (8.40) 


i>1 i=l 


but the rest of the argument is not standard. To use (8.40) we first need to estimate 
E exp Yj. Since we control the tails of Y;, this is not going to be very difficult. For 
dX. > O we define 


Vi(A) = sup(Ax — Uj (x)) . (8.41) 
x 
Since the function U; is convex, the limit A; := limy—+o U; (x)/x € [1, ow] exists 


and V;(A) < oo fora < dj. Note also that obviously V; is an increasing function 
of i. 


Lemma 8.2.4 For id > 0 we have 
EexpaAY; < exp V;(LA). (8.42) 


Proof Let us first observe (taking x = 0 in (8.41)) that V; > 0, and V; is convex 
with V;(0) = 0. Taking x = 4/2, and since Uj (x) = x? for |x| < 1, we get 


2 
A<2>V;(A) = a (8.43) 


and taking x = | that 
ViQ)2a-1. (8.44) 


Since U/(1) > 1, for x > 1 we have U;j(x) > x, so that by (8.34) we have 
P(|Y;| =x) < e~ and hence (using, e.g., that nos Lexp |x|/6), 


Yy; 
Ey? exp um <L. 


The elementary inequality e* < 1+ x +.x7el! yields that, if A < 1/2, 
EexpaY; < 1+A7EY? expalY;| < 1+ LA? < expLa’. (8.45) 
Now since 4 < 1/2, we have W< AV; (A), and since V; is convex, V; > 0, and 


V;(0) = 0, we have 4LV;(A) < V;(4LA), so that LA? < V;(4LA). This completes 
the proof of (8.42) in the case A < 1/2. 
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Assume now that A > 1/2, and observe that 
[o@) 
EexpalY;| = 14 expAx P(|Y¥;| > x)dx 
0 
[o.@) 
=1+ if exp(Ax — Uj(x))dx . (8.46) 
0 


We will prove that, for x > 0, 


Vi(6A) _ 


Ax — Uj(x) < Ax. (8.47) 


Combining with (8.46), this yields 


Vi (6A) 
ae 


ae Vi (6A 
exp eZ - x) dx = 1+ exp 


Eexpaly;| < 1 +f 
0 
Now V;(6A) > Vi(3) = 2 (using (8.44) in the last inequality), so that 1 + 
exp(V; (6A))/2 < exp(V;(6A)), completing the proof of (8.42). 
To prove (8.47) we first consider the case where x < 1. Then 4Ax < 4A, 4. < 
6A — | (since A > 1/2), and 6A — 1 < V; (6A) by (8.44), so that 44x < V;(6A). Thus 
Ax < V;(6A)/2 — Ax, and we have 


Vv; 
(6A) 
2 


Ax — Uj (x) < Ax < 
When x > 1 we have U;(x) > U;(x)/2 and then 


Ax — Uj(x) < Ax -— 


Dix) — Vis) 
2 2 


because 4.x — U; (x) < Vj (4A) by definition of V;. Since V; (4) < V; (6A) the proof 
is complete. Oo 
Lemma 8.2.5 For any u > 0 we have 


Duar) ats 


i>1 


Proof Recalling the definition (8.41) of Vj, it suffices to show that given numbers 
x; > 0, we have 


ulti |x; A 
> wa > Uj (xi) <u. (8.48) 


i>1 i>1 
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If ist U; (x;) < u, then by definition (8.36) of NV,,(t), we have Dizi lila < 
Nj (t) so we are done since }>;. , U; (xj) > 0. If 0,5, U; (x;) = Ou with @ > 1, 
then (since Uj (0) = 0 and 0; is convex) we have Sg U; (x;/0) < u, so that by 
definition of N,,(t), ei |t; |x; < ON,,(t) and the left-hand side of (8.48) is in fact 


<0. oO 


Lemma 8.2.6 [fv > 1 we have 
Nu (t) < uNu(t) « (8.49) 


Proof Consider numbers a; with st U; (aj) < uv. By convexity of U for v > 1, 
we have U;(a;/v) < U;(a)/v, so that )7;. | Uj(ai/v) < u. By definition of Ni (t), 
we then have )°;., tai/v < N,(t), ie., a tia; < vN,(t). The definition of 


Nw» (t) then implies (8.49). oO 


Proof of Proposition 8.2.3 Since by Lemma 8.2.6 we have vN,,(t) > Nou(t), we 
can assume v = |. Lemma 8.2.4 implies 


P(X; => y) < exp(—Ay)EexpArX; 


< exp(—Ay +) Vi(Loaltil)) - 


i=l 


We choose y = 2LoN;,(t), A = 2u/y, and we apply Lemma 8.2.5 to get 


—ay + D2 Vi(Loaltil) < -2u-+u = —w. o 
i>l 
We now define 
Buu) = {t; Nu(t) <u}. (8.50) 


These sets will play an essential part in the rest of the chapter. 
Corollary 8.2.7 [fu > 1 andt € B(u), we have ||X;||, < Lu.* 


Proof By definition of B(u) we have N,(t) < u. From (8.37) for v > 1 we have 
P(|X;| => Lyvu) < 2exp(—uv). The rv. Y = |X;|/L then satisfies P(Y > w) < 
2exp(—w) for w > u. We write Y = Y; + ¥2 where Yj = Yy<,y} and Y2 = 
Y1y5,;. Thus P(Y2 > w) < 2exp(—w) for w > 0, so that || Y2||, < Lu by (2.22). 


And we have |Y;| < uw so that ||Y1 ||, <u. oO 


4 Here ||X;l|, is the L? norm for p = u. 
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Since the sets B(u) are so important in the sequel, let us describe them in some 
simple cases. We denote by B, (0, v) the balls in £? for 1 < p < oo. When Uj (x) = 
x for all i and x, (8.38) implies 


Bo(0, Ju) C Bu) C Bo(0, V2u) . (8.51) 


When Uj (x) = x for all i and x, (8.39) implies 


5 (Boo. 1) 9 B20, /u)) C BU) C 2(Boo (0, 1) B2(0, Vu). (8.52) 


The third simplest example is the case where for some p > | and for all i, we 
have Uj(x) = U(x) = x? for x > 0. The case 1 < p < 2 is very similar to the 
case p = 1, so we consider only the case p > 2, which is a little bit trickier. Then 
U(x) = 2|x|? — 1 for |x| > 1. In particular U(x) > x? and U(x) > |x|’. Therefore 


Cu = [(aidizn ; )) Ui) < u| C Bp, ul/?) 9 B20, u'?) . (8.53) 


i>l 
Using now that U(x) < 2|x|? + x? we obtain 
1 1 1/ 
720, Jun 7 BO, ulPYCC,. (8.54) 


Thus, from (8.53) and (8.54) we have obtained a pretty accurate description of C,,, 
and the task is now to translate it in a description of B(u). For this we will appeal to 
duality and the Hahn-Banach theorem. In order to minimize technicalities we will 
now pretend that we work only with finite sums }7;_,, Yj (which is the important 
case). Let us denote by (x, y) the canonical duality of R” with itself, and for a set 
A CR", let us define its polar set A° by 


A° ={y eR"; Vx eA, (x,y) < 1}. (8.55) 


Lemma 8.2.8 We have B(u) = uC). 
Proof Combine the definitions (8.36) and (8.50). oO 
Lemma 8.2.9 /f A and B are closed balls® in R" and if A° + B° is closed then 


I 
5(A° + B°) C (AN BY? CAP +B. (8.56) 


fe} 
5 Not to be confused with the interior A of A! 


© A ball A is convex set with non-empty interior and A = —A. 
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Proof The proof relies on the so-called bipolar theorem (a consequence of the 
Hahn-Banach theorem). This theorem states that for any set A C R”, (A°)° is 
the closed convex hull of A. It is obvious that (C U D)° = C° NM D°. Using this 
formula for C = A®° and D = B° when A and B are closed convex sets yields 
AN B= (A°U B®)’, so that (AM B)° is the closed convex hull of A° and B°. Let 
us note that for a ball A, A®° is a ball so that AA° C A®° for 0 < 4 < 1. Then (8.56) 
follows. oO 


Since B(u) = uC, denoting by q the conjugate exponent of p, it then follows 


u? 


from (8.53), (8.54), and (8.56) that we have 
5(B2(0, Ju) + By (0, u'/4)) C Bu) C 2(B2(0, Vu) + Bg (0,u'/2)). (8.57) 


Exercise 8.2.10 Find a complete proof of (8.57), when we no longer deal with finite 
sums and which does not use the Hahn-Banach theorem. 


8.3 The Structure of Certain Canonical Processes 


In this section we prove a far-reaching generalization of Theorem 2.10.1. Recalling 

the r.v.s Y; of the previous section, and the definition X; = )0,;.,t¥; of the 

canonical process, we “compute E sup,<7 X; as a function of the geometry of T”. 
Recalling (8.50), given a number r > 4, we define 


gj(s,t) =influ>0; s—ter /BU)} (8.58) 


when the set in the right-hand side is not empty and 9; (s, t) = oo otherwise. This 
“family of distances” is the right tool to describe the geometry of T. 
We first provide upper bounds for E sup,<7 X¢. 


Theorem 8.3.1 Assume that there exists an admissible sequence (An) of T C €?, 
and for A € An an integer j,(A) € Z such that 


VAEA,,Vs,5' EA, P jn(A)(S, 5”) <2" (8.59) 
Then 
Esip®,; <= Lsup >) 2*r oh 4ne) | (8.60) 
teT teT n>0 


Proof For s,t € A € A, by (8.59) we have s —t € r— nA) B(2"), so that by 
Corollary 8.2.7 we have ||X; — X;]2» = ||Xs—zllon < L2"r-/), This means that 
then diameter of A,,(t) for the distance of L? with p = 2” is < L2"r~/"'A), The 
result then follows from Theorem 2.7.14. oO 
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To illustrate this statement assume first Uj(x) = x* for each i. Then (and 
more generally when U;(x) > xk for x > 1) by (8.51) we have gj(s,t) < 
Lr?/||s — t||, so that (8.59) holds as soon as r74"A) A(A, do)? < 2"/L, where dp 
denotes the distance induced by the norm of €?. Taking for j,,(A) the largest integer 
that satisfies this inequality implies that the right-hand side of (8.60) is bounded 
by Lr super -n>0 2/2 A(A,(t), d2). Taking the infimum over the admissible 
sequences (.A,,), this yields 


E sup X; < Lry2(T, d2). 


teT ~ 


Assume next that Uj(x) = x for each i. When ||5 —Tflloo < me (8.52) implies 
yj (s,t) < Lr?! ||s —t\|3, so that (8.59) holds whenever r/"““) A(A, doo) < 1/L and 
rnd) ACA, do)? < 2”/L, where doo denotes the distance induced by the norm of 
£©. Taking for j,(A) the largest integer that satisfies both conditions yields 


7A) < Lr(A(A, doo) +2-"/* A(A, a), 
so that (8.60) implies 


E sup X, < Lr sup )) (2 A(An(t), doo) + 2" A(An(t), d2)) , (8.61) 


teT teT n>0 


and copying the beginning of the proof of Theorem 4.5.13 this implies 


E sup X; < Lr(y2(T, dz) + 11 (T, doo)) - (8.62) 


teT ~ 


Let us now turn to the converse of Theorem 8.3.1. Since U; is convex, Uj (x) 
grows at least as fast as linear function. We will assume the following regularity 
conditions, which ensures that U; does not grow too fast: For some constant Co, we 
have 


Vi>1,Vs >1, U;Qs) < CoUi(s) . (8.63) 
This condition is often called “the Az-condition’”. We will also assume that 
Vi>1, U;(0) >1/Co. (8.64) 


Here, U : (0) is the right derivative at 0 of the function Uj (x). 


Theorem 8.3.2 Under conditions (8.63) and (8.64), we can find rg (depending on 
Co only) and a number K = K (Co) such that when r > ro, for each subset T of 7 
there exists an admissible sequence (An) of T and for A € A, an integer jn(A) € Z 
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such that (8.59) holds together with 


sup 5. ey eee) < K(Co)rEsup X; . (8.65) 
teT teT 


n=0 

Together with Theorem 8.3.1, this essentially allows the computation of 
E sup,<7 X; as a function of the geometry of T. It is not very difficult to prove 
that Theorem 8.3.2 still holds true without condition (8.64), and this is done in [47]. 
But it is an entirely different matter to remove condition (8.63). The difficulty is of 
the same nature as in the study of Bernoulli processes. Now that this difficulty has 
been solved for Bernoulli processes, by solving the Bernoulli conjecture, one may 
hope that eventually condition (8.63) will be removed. 

Let us interpret Theorem 8.3.2 in the case where U;(x) = x? for x > 1. In that 
case (and more generally when U; (x) < aL for x > 1), we have 


gj(s,t) > r7/||s — t|13/L, (8.66) 
so that (8.59) implies that A(A, dz) < L2”/2r~/") and (8.65) implies 


sup | 2"/7A(An(t), do) < LrE sup X,, 


teT 130 te 


and hence 


y2(T, dz) < LrEsup X, . (8.67) 


teT 


Thus Theorem 8.3.2 extends Theorem 2.10.1. 

Next consider the case where U; (x) = x for all x. Then (8.52) implies (8.66) and 
thus (8.67). It also implies that y;(s, 1) = co whenever ||s — flloo > 2r~/, because 
then r/(s — t) ¢ B(u) whatever the value of u. Consequently, (8.59) implies that 
A(A, doo) < Lr7#A), and (8.65) yields 


Vi(T, doo) < LrEsup X;. 
teT 


Recalling (8.62) (and since here r is a universal constant), we thus have proved 
the following very pretty fact: 


Theorem 8.3.3. Assume that the r.v.s Y; are independent and symmetric and satisfy 
P(|Y;| => x) = exp(—x). Then 


1 
pit, do) + yi(T, doo)) < Esup X; < L(2(T, do) + yi(T, doo)). (8.68) 
teT 
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Corollary 8.3.4 If T C £* then 
y2(conv T, dz) + yi (conv T, doo) < L(y2(T, d2) + n1(T, doo)) - (8.69) 


Proof Combine the trivial relation E sup, ccony 7 Xt = Esup;er Xr with (8.68). O 
Research Problem 8.3.5 Give a geometrical proof of (8.69). 


A far more general question occurs in Problem 8.3.12. The next two exercises 
explore the subtlety of the behavior of the operation “taking the convex hull” 
with respect to the functional y1(-, doo), but the third exercise, Exercise 8.3.8, is 
really important. It makes the Bernoulli conjecture plausible by exhibiting a similar 
phenomenon in an easier setting. 


Exercise 8.3.6 Consider the canonical basis (t,) of Cth = (t,iJi>1 With t,,; = 0 
fori ~ n and t,,, = 1. Give a geometrical proof that if T = {t},...,ty} then 
yi(conv T, doo) is of the same order as 7; (T, doo) (i.e., log N). Caution: This is not 
very easy. 


Exercise 8.3.7 Prove that it is not true that for a set T of sequences one has 
yi(conv T, doo) < Ly (T, doo) « 


Hint: Consider the set T of coordinate functions on {—1, 1}*. 


Exercise 8.3.8 Use (8.57) to prove that in the case Uj(x) = x?, p > 2, 
the inequality (8.32) can be reversed. Hint: You need to master the proof of 
Theorem 6.7.2. 


We now prepare for the proof of Theorem 8.3.2. 
Lemma 8.3.9 Under (8.63), given p > 0 we can find ro, depending on Co and p 
only, such that if r > ro, for u € R* we have 
B(8ru) C prBiu) . (8.70) 
Proof We claim that for a constant C; depending only on Co we have 


Yu > 0, U;(2u) < C\U;(u). (8.71) 


Indeed, it suffices to prove this for u large, where this follows from the A 2 
condition (8.63). Consider an integer k large enough that 2~*+? < p and let 
ro = CF. Assuming that r > ro, we prove (8.70). 

Consider t € B(8ru). Then Ng,-y(t) < 8ru by definition of B(8ru), so that 
according to (8.36) for any numbers (a;);>1 we have 


oe U;(a;) < 8ru > Yo ait) < Bru. (8.72) 


i=l i>1 
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Consider numbers b; with ae U; (bj) < u. Then by (8.71) we have Ue (2*b;) < 
CkU;(bi) < rU; (bi), so that ;., U;(2*b;) < ru < 8ru, and (8.72) implies 


dist 2k bit; < Bru. Since 2* > 8/ we have shown that 


7 ti 
Yi) <u> Do bi— <u, 
; ; pr 
i>l i=l 

so that NV,,(t/or) <u and thus t/pr € B(u),ie., t € ro Bu). o 


Lemma 8.3.10 Jf (8.70) holds for p < 1, then for all s,t € T and all j € Z, we 
have 9j+1(s, t) = 8r@;(s, t). 


Proof If gj4i(s,t) < u thens —t € r~J-! Bu) Cc r~/B(u/(8r)) and thus 


pj(s,t) < u/(8r). Thus g;(s, t) < gj41(s, 1)/(r). Oo 
Theorem 8.3.11 Under Condition (8.64) we can find a number 0 < p < 1 with the 
following property. Consider an integer m > 2. Given any points t,..., tm in €? 


and a > 0 such that 
LAL > t-te ¢aBlu) (8.73) 
and given any sets He C te + paB(u), we have 


E sup X,;> # min(u, logm) + minE sup X; . (8.74) 
te He L l<m teHe 


The proof of this statement parallels that of (2.120). The first ingredient is a 
suitable version of Sudakov minoration, proved by R. Latata, [47] asserting that, 
under (8.73) 


E sup X,, > = min, logm) . (8.75) 


L<m 


The second is a “concentration of measure” result quantifying the deviation of 
SUP;¢, Xr: from its mean, in the spirit of (2.118) and (6.12). This result relies 
on a concentration of measure property for the probability v of density e~7*! 
with respect to Lebesgue measure and its powers, which was discovered in [106]. 
Condition (8.64) is used here, to assert that the law of Y; is the image of v by a 
Lipschitz map. 

Both of the above results are fairly deep, and none of the arguments required is 
closely related to our main topic, so we refer the reader to [113] and [47]. 


Proof of Theorem 8.3.2 Consider p as in Theorem 8.3.11. If r = 2*~3, where x 
is large enough (depending on Co only), Lemma 8.3.9 shows that (8.70) holds for 
each u > 0. We fix such a value of r, and we prove that the functionals F, ;(A) = 
2LoE sup;¢4 Xr, where Lo is the constant of (8.74), satisfy the growth condition of 
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Definition 8.1.1. Consider n > 1| and points (te) for £ < m = N, as in (8.3). By 
definition of gj41 we have 


LAL >t —te dr-F"B2"*). (8.76) 


Consider then sets He C Bj+2(te, 2“ +”). By definition of j+2, we have 
Bj42(te, 2°*") = te+r J B(2**”). Using (8.70) for uw = 2” (and since 2" = 8r), 
we obtain that B(2“+”) C prB(2”) and therefore Hy C te + pr~/~!B(2"). Since 
logm = 2” log2 > 2”—!, we can then appeal to (8.74) with a = r~/~! to obtain 
the desired relation 


| LJ He) ae min Fn+1,j+1 (He) 
<m 


l<m 


that completes the proof of the growth condition. 
Let us denote by jo the largest integer such that 


r_/0 > LoEsup X; , (8.77) 
teT 
so that 
r-0 < LorE sup X; . (8.78) 
teT 


We prove that (8.8) holds for this value of jo. Indeed, supposing that this is not the 
case, and recalling (8.58), we can find ft), 2 € T witht) — ft ¢ aB(1) fora = r—J0, 
Then using (8.74) for m = 2 and Hj = {t)/a}, Ho = {t2/a} together with the fact 
that Xqr = aX; yields 


* < Emax(X,,, X,,) < Esup X;, (8.79) 
Lo teT 


which contradicts (8.77) and proves the claim. 

Taking into account Lemma 8.3.10 (which ensures that (8.9) is satisfied), we are 
thus in a position to apply Theorem 8.1.2 to construct an admissible sequence (A,). 
Using (8.78), (8.11) implies 


VtreT, De is < LrEsup X; . 


n>0 teT 


To finish the proof, it remains to prove (8.59). By definition of B;(t, u) and of 
yj, we have 


s€ Bj(t,u) > gj(s,t) <uss—ter/BU). 
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Thus (8.12) implies 

Wn>1, WAGE A,, VE A, 5 —taa Er A B2"). 
Since B(u) is a convex symmetric set, we have 


f 
s— tna € 74 BOM), 5! = tha € A BQN) => = e r—in(A) B(2") 


ss! 
. eo gees < Ot ; 
=> enca(>. =) = 
Thus we have shown that 
ss’ 
vn >1,VA€A,, Vs,8€ A, enar(5. =) <2", 

This is not exactly (8.59), but to get rid of the factor 1/2, it would have sufficed to 
apply the above proof to 2T = {2t; t € T} instead of T. oO 


As a consequence of Theorems 8.3.1 and 8.3.2, we have the following geometri- 
cal result. Consider a set T C €2, an admissible sequence (A,,) of T and for A € A, 
an integer j,(A) such that (8.59) holds true. Then there is an admissible sequence 
(B,) of conv T and for B € 6, an integer j,(B) that satisfies (8.59) and 


sup » 27 p-— in (Balt) < K(Co)r sup » 22 p—jn(An(@)) (8.80) 


teconvT 159 te n>0 


Research Problem 8.3.12 Give a geometrical proof of this fact. 


This is a far-reaching generalization of Research Problem 2.11.2. 
The following generalizes Theorem 2.11.9: 


Theorem 8.3.13 Assume (8.63) and (8.64). Consider a countable subset T of P. 
with 0 € T. Then we can find a sequence (Xn) of vectors of €? such that 


T C (K(Co)Esup X;) conv({x, ; n => 2} U {0}) 
T 


te 


and, for each n, 


Niogn (Xn) <1. 


8.3. The Structure of Certain Canonical Processes 307 


The point of this result is that, whenever the sequence (X,),>2 Satisfies 
Niogn (Xn) < 1,thenE sup,>2 Xx, < L. To see this, we simply write, using (8.37) 
with u = logn in the second inequality, that for v > 2, we have 


P( sup |Xz,1 = Lv) < )°PUXa,| = LUMognn)) 
n>2 


n>2 


< Ys exp(—v logn) < Lexp(—v/2) . (8.81) 


n>2 


Proof We choose r = ro depending on Co only and we consider a sequence of 
partitions of T as provided by Theorem 8.3.2. We choose to,r = 0, and for A € 
A,,n > 1 we select t,,4 € A,, making sure (as in the proof of Theorem 2.11.9) 
that each point of T is of the form ¢,,4 for a certain A and a certain n. For A € 
An, n > 1, we denote by A’ the unique element of A,_; that contains A. We 
define 


Pee tn,A — tn—1,A! 
A ntl p—jn1(A) 


and U = {ua; A € An,n > 1}. Consider t € T, so that t = t,,4 for some n and 
some A € An, and, since Ao(t) = T and to,r = 0, 


t=ta= ye th Ag(t) — th-1.Aga() = = QL tA y 4 4) . 


l<k<n 1<k<n 
Since )\y.9 24r-# A) < K (Co)E sup,er X1 by (8.65), this shows that 


T C (K(Co)Esup X;) convU . 
teT 


Next, we prove that Noni (u A) < 1 whenever A € Ay. The definition of 9; 
and (8.59) imply 


Vs, s'€A,s—s' er PA BQ"t) | 
and the homogeneity of NV, yields 
Ys, 5° €A,Nynti(s —5') < po Agnet 


Since ty 4, tr—1,a’ € A’, using the preceding inequality for n — 1 rather than n and 
A’ instead of A, we get 


a4 ’ 
Non (tn,A = th—1,A!) < W" + Jn-V(A’) , 


and thus Non (u4) < 1/2, so that Nyn+1(ua) < 1 using (8.49). 
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Let us enumerate U = (xX,)n>2 in such a manner that the points of the type u4 
for A € A; are enumerated before the points of the type u4 for A € Az, etc. Then 
if x, = ug for A € Ax, we haven < No+ Ni +---+ Ne < N? and therefore 
logn < 2**+!. Thus Mogn(Xn) < Noein) = Noes (ua) < 1. o 


Key Ideas to Remember 


e The ideas of Chap. 2 on how to measure the size of metric space smoothly extend 
to the setting of sets provided with families of distances, provided this families 
of distances satisfy suitable regularity conditions. These regularity conditions 
unfortunately are not satisfied in the most interesting case, that of Bernoulli 
processes. 

¢ Nonetheless we can obtain far-reaching generalizations of the majorizing mea- 
sure theorem and reach a complete understanding of the size of certain “canonical 
processes” which are linear combination of well-behaved (e.g., symmetric 
exponential) r.v.s. 


Chapter 9 ®) 
Peaky Part of Functions od 


9.1 Road Map 


The results of this chapter will look technical at first, but they are of central 
importance. We introduce a way to measure the size of a set of functions, which 
is in a sense a weakening of the quantity y2(T, dz). The main idea is to replace 
the L? distance by a family of distances obtained by suitable truncation, very much 
in the spirit of (7.63). This new measure of size will look mysterious at first, but 
we will eventually prove a structure theorem which gives (essentially) equivalent 
more geometrical ways to understand it. The first part of this structure theorem is 
Theorem 9.2.1 which asserts that controlling the size of the set of functions implies 
that the set can be decomposed into a sum of simpler pieces. The converse is stated 
in Proposition 9.4.4. 

For certain processes which are indexed by a class of functions (such as the 
empirical processes of Sect. 6.8), we will later prove that a control of the size of the 
process implies a control of the size of the class of functions in precisely the manner 
we are going to introduce now. The structure theorems of the present chapter will 
then be instrumental in making a more complete description. Furthermore, such 
a structure theorem is a key step of the proof of the Latata-Bednorz theorem, the 
towering result of this work which we prove in the next chapter. 

We advise the reader to review Theorem 6.7.2 at this point. 


9.2 Peaky Part of Functions, II 


Let us consider a measurable space {2 provided with a positive measure v. 


Theorem 9.2.1 Consider a countable set T of measurable functions on 92, a 
number r > 2, and assume that 0 € T. Consider an admissible sequence of 
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partitions (A,) of T, and for A € A, consider jn(A) € Z, with the following 
property, where u > 0 is a parameter: 


Wn >0,Vs,tE AeA, irre) —t(@))|? A ldv(@) < u2". (9.1) 


Let § := Super n> 2" r—in(An) | Then we can write T C T; + Tz + T3, where 
0 € Ti, where 7 


y2(Ti,d2) < LVuS , (9.2) 
Vi(T1, doo) = LS, (9.3) 
VteETo, |Itll, < Lus, (9.4) 
and where moreover 
WteT3, ds eT, |t|< 515 |Liois)>-— Jo} : (9.5) 


To illustrate the meaning of this theorem, we replace (9.1) by the stronger 
condition 


Vs,teA, [irre — t(@))|?dv(@) < u2" , (9.6) 


which simply means that A(A, d2) < /u2"/?r—Jn), so that y2(T, d2) < /uS. 
Then the previous decomposition is provided by Theorem 6.7.2, and we may even 
take 73 = {0}. The point of Theorem 9.2.1 is that (9.1) requires a much weaker 
control of the large values of s — ¢ than (9.6). Equation (9.1) says little about the 
functions |s|1 {|s|=r—J0(P)}+ This is why the term 73 of the decomposition is required. 
This term is of secondary importance, and in all our applications it will be easy 
to control. It is however very important that the condition (9.5) does not depend 
on u. It is also instructive to convince yourself that the case u = | implies the 
full statement of Theorem 9.2.1, by applying this case to the measure v’ = v/u. 
Let us also observe that the important case of Theorem 9.2.1 is for v an infinite 
measure, the prime example of which is the counting measure on N. (When, say, v 
is a probability, it is too easy to satisfy (9.1), and it is too easy for a function to be 
integrable.) 

It is not apparent yet that Theorem 9.2.1 is a sweepingly powerful method to 
perform chaining for a Bernoulli process. 


' The idea is of course that the smallest value of S over the preceding choices is the appropriate 
measure of the size of T. 


* It is fruitful to think of the quantity S as a generalization of sup, )>2”/? A(An(t), d2)). 
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The bad news is that the proof of Theorem 9.2.1 is definitely not appealing. The 
principle of the proof is clear, and it is not difficult to follow line by line, but the 
overall picture is far from being transparent. 


Research Problem 9.2.2 Find a proof of Theorem 9.2.1 that you can explain to 
your grandmother. 


Exercise 9.2.3 Isn’t it surprising that there is no dependence in r in (9.2) to (9.5)? 
Show that in fact the result for general r can be deduced from the result for r = 2. 
Hint: Define j/ (A) as the largest integer with Qin(A) < pinA), 


When we will prove the Bernoulli conjecture, we will need Theorem 9.2.4, a 
more general version of Theorem 9.2.1. The statement of Theorem 9.2.4 involves 
some technical conditions whose purpose will only become apparent later, but its 
proof is exactly the same as that of Theorem 9.2.1. In order to avoid repetition, we 
will deduce Theorem 9.2.1 from Theorem 9.2.4 and then prove Theorem 9.2.4. 


Theorem 9.2.4 Consider a countable set T of measurable functions on a measured 
space (§2,v), anumberr > 2, and assume that 0 € T. Consider an admissible 
sequence of partitions (A,) of T. Fort € T andn > 0 consider j,(t) € Z and 
y(t) € T. Assume that m0(t) = 0 for each t and that the following properties hold: 
First, the values of jn(t) and 1 (t) depend only on Ay (t), 


Vs,teT , Vn =0; s © An(t) > jn(S) = jn@t) 3 Wn(S) = T(t). (9.7) 
The sequence (jn(t))n>1 is non-decreasing: 
wWeeT , Wn =0, jntiQ) = in. (9.8) 


When going from n ton + 1 the value of m,(t) can change only when the value of 
Jn(t) increases: 


Wee T , Wn >0, jn(t) = jngi@) > mt) = msi). (9.9) 
When going from n ton + I, if the value of jn(t) increases, then m+ (t) € An(t): 

WeeT, Wn20, jntilt) > int) > Anti) € Ant) . (9.10) 
Fort € T andn => Owe define 2, (t) C 2 as Qo(t) = 2 ifn = 0 and 


Q(t) = {wo EQ; 0<q <n => |mq41 MO) — OO) <r-"}. OAD 


3 | spent a lot of time on this problem. This does not mean that the proof does not exist, simply that 
I did not look in the right direction. 
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Let us consider a parameter u > | and assume that 


VYteT, Wn>0, / Ir (t(@) — m(t)(@)) |? A ldv(@) < u2". (9.12) 
Qilt) 


Then we can write T C T,; + To + T3, where 0 € Ti, with 


yo(Ti, do) < Lfusup )) 2" , (9.13) 
teT n>0 
Vi(T1, doo) < Lsup ) 2" h® , (9.14) 
teT n>0 
Wee To, [tll < Lu sup ))2"r-4® , (9.15) 
teT n>0 
and where moreover 
Wie T3, Av ET, |t| <5|slVpy5)>,-i010y - (9.16) 


Proof of Theorem 9.2.1 We deduce this result from Theorem 9.2.4. We set j,(t) = 
maxo<k<n jk(Ax(t)), and we define 


p(n, t) =inf{p>0; jp) = jn}, 


so that p(n, t) < n and thus Apo (t) D An(t). Also, by definition of j,(¢) for 
p = p(n, t), we have 


In(t) = jp(t) = Jp(Ap@)) - (9.17) 


For each t € T we define t9,7 = 0. For A € A, , n => 1, we choose an arbitrary 
point ¢,,4 in A. We define 


T,(t) = tp(n,t),B where B = A pint) (t) ; 


and we note that zo(t) = 0. When s € A,(t) we have A,p(s) = Ap(t) for p <n 
and thus p(n,s) = p(n,t) so that z,(s) = m,(t). Also, if jnii(t) = jn(t) we 
have p(n, t) = p(n + 1,1), so that 7, (t) = 2,41 (t). This proves that (9.7) to (9.9) 
hold. Moreover, when jy (t) > jn—1(t) we have p(n, t) =n so that ,(t) = t,,4 for 
A = A,(t), and thus z,(t) € An(t) C Ay,—1(t), and this proves (9.10). Finally, (9.1) 
used for p = p(n, t) and Ap = Ap(t) = Apcnr(t) reads 


Vs,s' EB, [Pinerrs —s'|/? A ldv < u2? 
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and this implies (9.12) since by (9.17) jp(Ap) = jn(t) and m(t) = tn.p € B. The 
proof is complete. Oo 


We turn to the proof of Theorem 9.2.4. The principle of the proof is, givent € T, 
to produce a decomposition t(@) = t!(w) + t?(w) + t?(w) where one defines the 
values t!(w), t?(w), t?(w) from the values z,(t)(w) for n > 1 and to then check 
that the required conditions are satisfied. Despite considerable efforts, the proof 
is not really intuitive. Maybe it is unavoidable that the proof is not very simple. 
Theorem 9.4.1 is an immediate consequence of Theorem 9.2.1, and it has sweeping 
consequences. 

Our strategy will be to define t!(w) as Iin(w)(t)(@) for a cleverly chosen value of 
n(w).* To prepare for the construction, we may assume that 


sip 2 e) <0Oo, (9.18) 
teT n>0 
and in particular that 
VWeeT, lim j,(t)=o. (9.19) 
noo 


Fort € T andw € £2, we define 
m(t,@) = inf {n> 0; |zn41(t)(@) — An(t)(@)| > rn} 
if the set on the right is not empty and m(t, w) = oo otherwise. In words, this is 
the first place at which z,(@) and 2,+1(@) differ significantly. We note from the 
definition (9.11) of 2,,(t) that 
Q(t) ={wEe Q; m(t,o) =n}. (9.20) 


Lemma 9.2.5 Under the assumptions of Theorem 9.2.4, ifn < m(t, @) then 


V5 [msi (t)(@) — Am (t)(@)| < ArH | (9.21) 


n<m<m(t,o) 
Proof By definition of m(t, w), we have 


m <m(t,@) > |Xm4i(t)\(@) — Hm(t)(@)| < rn | (9.22) 


4 One could also attempt to proceed as in the proof of Theorem 6.7.2: to write the chaining identity 
t= Donel (y(t) — 2-1 (t)) and to use Lemma 6.7.1 for each of the increments z,,(t) — 1,_1 (ft), 
with a suitable value of u = u(t, n). This does not seem to make the proof any easier. 
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From (9.9), when jm+i(t) = jm(t) we have 1n+41(t) = Wm(t). Consequently for 
m < m(t, @) we have 


[7m+1(t)(@) — Tm (t)(@)| <7" OIG 0) >Jim} « (9.23) 
Therefore 


> emt O@) — mm O@)| = Dor Og, O>i + 


n<m<m(t,o) m>n 


The sum on the right is a sum of terms r~/ where the values of j are all different. 
Since r > 2, this sum is at most twice its largest term. oO 


When m(t, w) = 00 it follows from (9.21) and (9.19) that the sequence (71, (t)(@)) 
is a Cauchy sequence. Consequently limy_, 45 1, (t)(@) exists. Let us then define 


t!(@) = Tn(t,w) (1)(@) 
when m(t, w) < oo and 
t'(w) = lim an (t)(@) 
noo 
if m(t,w) = oo. It follows from (9.21) that forn < m < m(t,@) we have 
\m+i(t)(w) — n(t)(@)| < 2r-%, so that (and since t!(w) — mn(t)(w) = 0 
forn = m(t, @)) 
n <m(t,@) = |t'(@) — mn(t)(@)| < 27 (9.24) 
According to (9.7) the value of jo(t) is independent of r: 
WteT, jo(t)= jo. (9.25) 
Since o(t) = 0, (9.24) implies 
It!(w)| < 2r-0 = 27-740 | (9.26) 
For t € T, we define 
E(t) = {we Q; |t(@)| <r” /2} 
and 


Pre (t—e gy, P= (t—t 1 gq. . 
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We define 
T=({t!; teT}; h={P’:teT}; B={P;teT}. 


For w € &(t)° we have |t(@)| > r—40 /2, whereas It! (w)| < 2r-/0 by (9.26). Thus 
for such w we have |t! (w)| < 4|t(@)|. Therefore |t3(w)| = |t(w)—t!(@)| < 5\t(@)|. 
We have proved that \t3| < 5|t|1z ac, so that the set 73 satisfies (9.16). 

We start the study of 7;. The reader would do well to review the proof of 
Theorem 6.7.2, since the method we use here is exactly the same as was used there 
for the control of 7;. The goal is to define a sequence (U,,) of sets with card Un < Nn 
which are approximations of 7; both in the L* and L® norm, after which we will 
obtain (9.13) from Proposition 2.9.7 and (9.14) through a similar principle. For 
n > O, we define i by 


ty (®) = Tnam(t,o)((@) 5 
so that 
Voce, t!(o) = im th(w) . (9.27) 
We define our approximating sets 


CH 2 eT). 
We will first control the cardinality of U, in the next lemma and then show that 
these sets are good approximations of T; both in the L? and the L© norms. 


Lemma 9.2.6 We have card U, < Ny. 


Proof We prove that when s € A,(t) then t} = s!. In other words, for A € A, all 
the elements t} for t € A are the same. Thus card U, < card A, < N,. Consider 
s € An(t). Then Ag(s) = Ag(t) for g < n, so that mg(s) = mq (t) by (9.7). The 
definition of m(t, w) shows that for any n’, the points q(t) for 0 < q <n’ entirely 
determine whether or not it is true that m(t, w) < n’. Consequently, when s € An(t) 


we haven A m(t, w) =n A m(s, @) for each a, so that i — si, oO 
Now we prove that the sets U, approximate 7; for the L° norm. We note that 
t!(w) — is (w) = Oifn > m(t, w), and by (9.21) that ifn < m(t, w), then 


I@)-—hOls > lama @@) — m@)(@)| < 277k 


n<m<m(t,o) 


Thus ||t! — t} IIo < 2r7", and hence doo (t!, Un) < 2r7”. Thus (9.14) follows 
from the analog of Proposition 2.9.7 for y, rather then y2. 
Before we continue, let us explain how we will use (9.12) and (9.10). 
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Lemma 9.2.7 We have 


I I" Gen 41(Q() = Malt)? A 1dv(@) < u2" (9.28) 
Qn (t) 


and 
v(Qy(t) \ Qn4i()) < u2". (9.29) 


Proof To prove (9.28) we may assume that m,(t) 4 myp41(f) so that jniit(f) > 
Jn(t) by (9.9). Then (9.10) shows that s := mn41(t) € An(t). Using (9.12) for s 
rather than f, and since z,(s) = m,(t) and jy(s) = jn(t) because s € A,(t), we 
obtain (9.28). Then (9.29) follows since ri Olgas 4(t) — m(t)| => 1 on Qy(t) \ 
Qn+i(t). oO 


We turn to the proof of (9.13). For this we will show that U,, approximates 7) for 
the L* norm. 


Lemma 9.2.8 We have 
trea — tale < Su2rr—n® | (9.30) 
Proof First we observe that 
trot — ty = On41@) — MO anc - 


Indeed, if @ € 2y41(t) = {m(t, -) => n+ 1}, then i (w) = 1p(t)(w) and raven CY) = 
Tn4i(t)(w), while if o ¢ Qn41(t), then m(t,@) < n and thw) = th, ,(@) = 
Tm(t,w) (a). 

By definition of m(t, w) we have |7n+41(t) —n(t)| < rin whenever m(t,-) > 
n+ 1,iLe., on 2,41 (t) by (9.20). Therefore, 


IIteer — _ll3 = i nai (t)(@) — ty (t)(@)|7dv(w) 
n+1 t 
< pint) i: Ir" (tn41(1)(@) — mn (t)(@))/? A Idv(@) , 
Qnsi(t) 


so that (9.30) follows from (9.28). oO 


Proof of (9.13) Combining (9.30) and (9.18) implies that the sequence G9) isa 


Cauchy sequence in L”, so that it converges to its limit, which is ¢! from (9.27), 
and hence limg-+o0 ||t’ — tj |]2, = 0, so that |[t’ — t)l]2 = limgsoo lt — thll2. 
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Consequently 
1 1241 : 1 _.41 
dx(t ’ Un) < \|t t, ll2 = lim IItq ~~ th ll2 
qo 


2) lnunt bevey Per. (9.31) 


m>n m>n 
Since 

S272 S22 pin) — SY 2/2 p iOS 97? < LY ayn , 

n>0 m=n m>0 nsm m=0 
we conclude by Proposition 2.9.7 again that (9.13) holds. oO 


We turn to the proof of (9.15). This is where there are new arguments compared 
to the case of Theorem 6.7.2. We define 


(t,o) = int {n > 05 [angi ()(@) ~ 1(@)| sr ol 


if the set on the right is not empty and r(t, w) = oo otherwise. 


Lemma 9.2.9 Let us define - =(t- t)1p¢,j=ning@)- Then 


= > Z (9.32) 
n>0 
and 

It7 11 < 3r- "Ovo EQ; r(t,@) =n}N E(t). (9.33) 

Proof Let us fix w € S(t). Then 
|770(t)(w) — t(w)| = |t(@)| <r /2. (9.34) 

By definition of r(t, w) we have 
n <r(t,@) > |nn41(t)(@) — t(@)| < sr dees : (9.35) 


Consequently, forO <n <r(t,o), 


IZn+1(t)(@) — Hn(t)(@)| < |7nt1()(@) — t(@)| + lt (1) (@) — t@)| 


IA 


sores 4 r—in@)) < point) ; (9.36) 
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where for n > 0 we use (9.35) for n and n — | and for n = O we use also (9.34). 
Consequently r(t,@) < m(t,w). When r(t,@) = oo then m(t, a) = o so that, 
recalling (9.19), by (9.35) we have t(@) = limp-+o0 Mn (t)(@) = t!(w) and t?7(@) = 
t(w) — t!(w) = 0. Therefore we have proved that 


t?(@) = t(@@) — t!@) = (¢(@) — t!(@))1p¢-¢,0) <0} 


= Di) = t'@))1prc.o)=n) - (9.37) 


n>0 
Since this holds for each w € &(t), we have proved (9.32). Now, whenn = r(t, @), 


we have m(t,w) > n and, using (9.24), |a,(t)(w) — t!(@)| < 2r7/"™. Now, if 
n > 0, using (9.35) for n — 1, we have 


V3 
|t(w) — Tn (t)()| < aed 
and if n = 0 this holds by (9.34). Consequently 
It(w) —t!(w)| < |t(@) — tn (t)(@)| + [tn ()(@) — t1(@)| < 3r7# , 9.38) 


and this proves (9.33). oO 
Lemma 9.2.10 Under the assumptions of Theorem 9.2.4, it holds that 


vi{a Ee 2; r(t,@) =n} F(t) < Lu2". (9.39) 


Proof Since for @ € S(t) we have r(t,w) < m(t,q@), using (9.20) we get 
{w; r(t,@) =n} S(t) C 2,(t) and therefore 


v{@eQ; r(t,@) =nyn F(t) < vfwe 2; r(t,o) =n} N Qy41(t)) 
+ V(Q2n(t) \ 2n41 (0) - (9.40) 


Now, since |7n41(t)(@) — t(w)| > r7/"+! /2 when r(t, w) = n, we have 
1 
gre € 2; r(t,@) =n} nti) 


é i Ir (0444 (t)(@) — t(@))|? A 1dv(@) < u2"*", 
RQn+i(t) 


using (9.12) for n + | rather than n in the last inequality. Combining with (9.29) 
completes the proof. Oo 


Combining (9.33) with (9.39) proves that ||1?\]2 < Lu2"r-/"©, Combining 
with (9.32), we have proved (9.15) and completed the proof of Theorem 9.2.4. 
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Let us stress how simple is the decomposition of Theorem 9.2.4. To explain this we 
focus on the case where ||f||oo < r~/0/2 for each t (recalling that jq(t) does not 
depend on fr). Then, recalling (9.11) (and that z(t) = 0), we have the formulas 


t= Sant) — miAaMlaw i ? = ¢-mO12@\ e100 - 9-41) 


n>1 n>0 


The main difficulty in the proof of Theorem 9.2.4 is that it does not seem true that 
we can easily control ie, Il (t — 7p (t))1.2,,(¢)\2n41) 11 Gin sharp contrast with the 
second proof of Theorem 6.7.2). 

It is hard at that stage to really see the power of Theorem 9.2.4, which will 
come into full use later. Let us make some comments on this, although the reader 
may not fully understand them until she studies Chap. 10. Let us think that we are 
actually trying to construct the partitions (A,) and the corresponding objects of 
Theorem 9.2.4. We have already constructed these objects up to level n, and we try 
to construct the next level. We have to ensure the constraint (9.10), but short of that 
we are pretty much free to choose 71,41 (t) to our liking. The magic is that whatever 
our choice, we will drop the part 2, (t)\2n+1 (f) of 2 where n+) (t) is too different 
from z(t). The reason we can do that is that on this set the decomposition is 
finished: recalling that we assume for clarity that &(t) = 2 for ¢ and recalling 
that 2,(t) = {m(t,@) > n}, on the set Q2,(t) \ 2n41(t) we have m(t,w) =n 
so that t! = z,,(t) and t2 = t — y(t). We have already decided the values of a 
and ¢*! In particular for the (n + 1)-st step of the construction, we are concerned 
only with points m € {2,(t), and for these points the sequence (zt, (t)(@))q<n does 
not have big jumps. We will take great advantage of this feature in the proof of 
Theorem 6.2.8.> 

Another point which is quite obvious but needs to be stressed is that in 
performing a recursive construction of the (A,), we only need to care about 
controlling the zr, (t), and we do not have to worry “that we might loose information 
about 7”. If you need to visualize this fact, you may argue as follows. As we 
explained, if we know that m(t,w) < n, we already know what are t!(w) and 
t?(@) so that we no longer have to worry about this value of w in the sequel of 
the construction. When m(t, @) > n we may write the peaky part 7 («) of f at w as 


t?(@) = t(@) — Ain(t,o) (t)(@) 


= t() — m(t)(@) + Gn (O(@) — Tinto) O(@)) ; 


5 Despite the fact that the proof of Theorem 9.2.4 is identical to the proof of Theorem 9.2.1, the 
formulation of Theorem 9.2.1, which is due to W. Bednorz and R. Latata, is a great step forward, 
as it exactly identifies the essence of Theorem 9.2.1. 
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and, in a sense “we have attributed at stage n the part t(w) — m(t)(@) of t(w) to the 
peaky part 7(w) of t”. (So that we no longer need not think about ¢ itself.) 

These considerations may look mysterious now, but they explain why certain 
constructions which we perform in Sect. 10.3 keep sufficient information to succeed. 


9.4 Chaining for Bernoulli Processes 


Our first result is a simple consequence of Theorem 9.2.1. It is a generalization of the 
generic chaining bound (2.59) to Bernoulli processes. The sweeping effectiveness 
of this result will be demonstrated soon. We consider a number r > 2 and we recall 
the quantity b*(T) of (6.9). 


Theorem 9.4.1 Consider a subset T of ¢*, and assume that 0 € T. Consider an 
admissible sequence of partitions (Ay) of T, and for A € Aj consider a number 
jn(A) € Z with the following properties, where u > 1 is a parameter: 


Yn >0,Vx,yEeAEA,, re Ir A(x; — yj) |? A1 < u2”, (9.42) 


i=l 


where x A y = min(x, y). Then 


ae ies L(u sup )) 2" An) + sup) > ED yaier-atr J ; (9.43) 


x€T 150 xeT i>] 


Moreover if ¢; are independent Bernoulli r.v.s, for any p = 1, we have 


(E sup | Yxei|”) < K(p)u sup Yo atrial) 


xeT i> xeT 150 


+Lsup 3° |xiMpjyis-ioom) (9-44) 


xeT i>1 


Conditions (9.42) is simply (9.1) in the case where the measure space is N with 
the counting measure. Condition (9.42) may also be written as 


vVx,yeAe An, Do [xi— yi Ar 2 < yor 2h | (9.45) 


i>1 


The point of writing (9.42) rather than (9.45) is simply that this is more in line with 
the generalizations of this statement that we shall study later. 
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If, instead of condition (9.45), we had the stronger condition 


Vx,yeAE An, Yo [xi —yil? Sur" 7h , (9.46) 


i>1 


this would simply mean that A(A) < /u2"/*r-/4) so that then y2(T) < 
JK SUP eT >on 2"r—in(An@) and we would prove nothing more than the generic 
chaining bound (2.59). The point of Theorem 9.4.1 is that (9.42) is significantly 
weaker than condition (9.46), because it requires a much weaker control on the 
large values of x; — y;. It may be difficult at this stage to really understand that this 
is a considerable gain, but some comments that might help may be found on page 
435. 
The proof of Theorem 9.4.1 relies on the following: 


Proposition 9.4.2 Under the conditions of Theorem 9.4.1, we can write T C Ti + 
To + T3 where 0 € T; and 


yo(T1, d2) < LYu sup 9) 2"r An) , (9.47) 
xeT 150 
Tides) = Lsup > Dy eee, (9.48) 
x€T 150 
Veet, |xll1S Lusup ) (2% he | (9.49) 
x€T 50 
and 
VxeT3, VET, VEEN, [xi] < Sly gypsy - (9.50) 


Proof This follows from Theorem 9.2.1 in the case 82 = N* and where v is the 
counting measure. Oo 


Proof of Theorem 9.4.1 To prove (9.43) we use that T C T; + (I> + T3), so that 
by definition of b*(T) we have 


b*(T) < y2(T1,d2) + sup |x| 
XET)+T3 


and we use (9.47), (9.49), and (9.50). To prove (9.44) we show that for 7 = 
1, 2,3 the quantity (E SUPyeT; | ae x;e;|?)!/P is bounded by the right-hand side 
of (9.44). For 7 = 1 this follows from (2.66) (using also that 0 € 7)), and for 
j = 2,3 this follows from the bound (E SUPy 7; | Doel ejx;|P)1/P < SUP er; I|x|]1- 
The reason we have a factor u (rather than ./u) in the right-hand side of (9.43) is 
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that we have a factor u in (9.4) and hence in (9.49). As we will see this factor does 
not create problems and there is plenty of room. oO 


The following is a simple consequence of Theorem 9.4.1: 


Corollary 9.4.3. Assume that moreover 


Vx ET, ||xXlloo <r 0/2. (9.51) 
Then 
PT) <Lasip y Pe eee, (9.52) 
xeT hs 
Proof In (9.43) the second term in the right-hand side is identically zero. Oo 


Corollary 9.4.3 is in some sense optimal as the following shows: 


Proposition 9.4.4 Assume that 0 € T Cc £7. Then we can find a sequence 
(An) of admissible partitions of T and for A € An a number j,(A) such that 
conditions (9.51) and (9.42) are satisfied for u = 1 and moreover 


sup) 27 eae) = KG"); (9.53) 


xeT 59 
This will be proved in Sect. 10.15. The situation here is the same as for the 
generic chaining bound for Gaussian processes. There is no magic wand to discover 
the proper choice of the partitions A,, and in specific situations this can be done 
only by understanding the “geometry of T”, typically a very challenging problem. 

We should point out one of the (psychological) difficulties in discovering the 
proof of Theorem 6.2.8 (the Latata-Bednorz theorem). Even though it turns out from 
Proposition 9.4.4 that one can find the partitions A, such that (9.42) holds, when 
proving Theorem 6.2.8, it seems necessary to use partitions with a weaker property, 
which replaces the summation over all values of i in (9.42) by the summation over 
the values in an appropriate subset £2, (t) of N*, as in (9.12) above. 

Efficient bounds on random Fourier series as presented in Sect. 7.8.5 were first 
discovered as an application of Theorem 9.4.1. The advantage of this method is that 
it bypasses the magic of Theorem 7.8.1 while following a conceptually transparent 
scheme. It is sketched in the following exercises (the solution of which is given in 
the briefest of manners). 


Exercise 9.4.5 We consider complex numbers aj and characters x;. We fix a 
number r > 4 and we define w;(s,t) = - Iraj (xis) — x; (t))|? A 1. Consider 
a parameter w > 1 and forn > Oset D, = {s € T; Wj(s,0) < w2"}. Assume 
that w(Do) => 3/4 and that w(D,) = Nv! for n > 1. Prove that there is an 
admissible sequence of partitions (A,) of T and integers j/ with jj = jo such 
that )°,592"7 In < Lys 27% and that Wj (s, 1) < Lw2” fors,t € A € An. 
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Exercise 9.4.6 Under the conditions of the previous exercise show that for each 
p = 1 it holds that 


(E sup | > eiai(xi(s) — Xi |”) 
seT | 


< K(r, pw) 2"r-h + KG) > lailligajer-oy- (9-54) 
i 


n>0 


Key Ideas to Remember 


¢ We have introduced a method to measure the size of a set of measurable functions 
using a family of distances. Control of this size allows structural information of 
the set. 

¢ Control of the size of a set of functions on N allows sharp bounds on the 
corresponding Bernoulli process. 


9.5 Notes and Comments 


Theorem 9.4.1 can be thought as an abstract version of Ossiander’s bracketing 
theorem (that we shall prove in Sect. 14.1). The author proved Theorem 9.2.1 (in an 
essentially equivalent form) as early as [115] and in the exact form presented here in 
[129], but did not understand then its potential as a chaining theorem. The version 
of this work at the time the author received [16] contained only Theorem 9.2.1, with 
a proof very similar to the proof of Theorem 9.2.4 which we present here. 


Chapter 10 ®) 
Proof of the Bernoulli Conjecture oes 


The present chapter will use to the fullest a number of the previous ideas, and 
the reader should have fully mastered Chaps. 2 and 6 as well as the statement of 
Theorem 9.2.4. 

The overall strategy to prove the Bernoulli conjecture is somewhat similar to the 
one we used to prove Theorem 2.9.1: we recursively construct increasing families of 
partitions, and we measure the size of the elements of the partitions through certain 
functionals. Once this appropriate sequence has been constructed, the required 
decomposition will be provided by Theorem 9.2.4. 

Just as in the case of the Gaussian case, Theorem 2.10.1, a main tool is the 
Sudakov minoration, which now refers to (6.21). In contrast with the Sudakov 
minoration in the Gaussian case, (2.116), we cannot use this result unless we control 
the supremum norm. 

The basic tool to control the supremum norm is the method of “chopping maps” 
of Sect. 10.3. In this method to each element t € ¢* we associate a new element 
t’ € £7, of which we control the supremum norm. The difficulty is that this operation 
decreases the distance, d(t’,u’) < d(t,u), and some of the information we had 
accumulated on the metric space (7, d) is lost in that step. The method may “change 
the set T” (as well as the set of underlying Bernoulli r.v.s) at every step of the 
construction, and consequently we will need to change the functional at every step 
of the construction. Furthermore, this functional will depend on the element of the 
partition we try to split further, a new technical feature. 

A main difference between the proof of the Bernoulli conjecture and that 
of Theorem 2.9.1 is that instead of having two really different possibilities in 
partitioning a set, as in as in Lemma 2.9.4, there will now be three different 
possibilities (as expressed in Lemma 10.7.3). A radically new idea occurs here, 
which has no equivalent in the previous results, and we start with it in Sect. 10.1. 

As the reader will soon realize, the proof of the Bernoulli conjecture is rather 
demanding. One reason probably is that there are missing pieces to our understand- 
ing. Finding a more transparent proof is certainly a worthy research project. The 
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good news however is that understanding the details of this proof is certain not 
required to continue reading this book, since the ideas on which the proof relies 
will not be met any more. It is however essential to understand well the meaning of 
the results of Sect. 10.15, which are the basis of the fundamental results of the next 
chapter. 

I will not hesitate to state that the Latata-Bednorz theorem is the most magnificent 
piece of mathematics I ever came across! and that I find that it is well worth making 
some effort to understand it. Not only several beautiful ideas are required, but the 
way they are knitted together is simply breathtaking. I am greatly indebted to Kevin 
Tanguy for his help in making this chapter more accessible. 


10.1 Latata’s Principle 


The principle proved in this section is a key to the entire chapter.” It was proved first 
by Rafat Latata in [49], but it was not obvious at the time how important this is.° 

Consider a subset J of N* := N \ {0} and a subset T of ¢7. Our goal is to 
compare the processes ()°;., éiti)rer and Ques eitireT We define b;(T) := 
E supjer a, ejt;. Thus bj (T) < b(T). It may be the case that by (T) = b(T) 
as, for example, when tf; = 0 fori ¢ J and t e€ T. Note that in that case the 
diameter of T for the canonical distance d is the same as its diameter for the smaller 
distance dj given by a (s,t) = ies (Si _ tj). This brings us to consider two 
typical (nonexclusive) situations: 


e First, the diameter of T for d is about the same as its diameter for d,. 
¢ Second, b;(T) is significantly smaller than b(T). 


Latata’s principle states that if one also controls the supremum norm, the set 
T can be decomposed into not too many pieces which satisfy one of these two 
conditions. 


Proposition 10.1.1 (Latata’s Principle) There exists a constant L with the fol- 
lowing property. Consider a subset T of €? and a subset J of N*. Assume that for 
certain numbers c,o > 0 and an integer m > 2 the following holds: 


Vs,teT, Yi - a se? (10.1) 
ieJ 


' A statement that has to be qualified by the fact that I do not read anything! 

? It is in homage to this extraordinary result that I have decided to violate alphabetical order and 
to call Theorem 6.2.8 the Latata-Bednorz theorem. This is my personal decision, and impressive 
contributions of Witold Bednorz to this area of probability theory are brought to light in particular 
in Chap. 16. 

3 The simpler proof presented here comes from [17]. 

4 The resulting information will allow us to remove some of the Bernoulli r.v.s at certain steps of 
our construction. 
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teET => IX Ilo < (10.2) 


80 
Jlogm — 
Then provided 


cx 


, (10.3) 


2 |S 


we can find m' < m+ 1 anda partition (Ag)¢<m of T such that for each € < m' we 
have either? 


ato eT, Ae C Bo(t*,o), (10.4) 
or else® 
by(Ag) < OCT) = + Viogm (10.5) 


In (10.1) we assume that the diameter of T is small for the small distance d7. 
We then produce these sets Ag on which extra information has been gained: they 
are either of small diameter for the large distance d = dy as in (10.4), or they 
satisfy (10.5). The information of (10.4) and (10.5) are of a different nature. It 
is instructive to compare this statement with Lemma 2.9.4. We also note that we 
require control in supremum norm through (10.2). Indeed, this control is required 
to use the Sudakov minoration for Bernoulli processes (Theorem 6.4.1) which is a 
main ingredient of the proof. 


Proof Let us fix a point to of T and replace T by T — to. We then have 


weT, \ <c* (10.6) 
ieJ 
and 
16 
rrSiig=——, (10.7) 
Jlogm 
For t € T set 


Y,= Se eiti a Aa So citi ; (10.8) 


ieJ igJ 


5 As usual the ball B(t®, o) below is the ball for the distance d. 
6 Tn fact, all the sets Ag satisfy (10.4) except at most one which satisfies (10.5). 
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so that 


b(T) = Esup(% + Z;) . (10.9) 
teT 


We may assume that T cannot be covered by m balls of the type B(t, 7), for the 
result is obvious otherwise. It thus makes sense to define 


= inf E sup Y; . 
FCT,card F<m t€T\User B(s.o) 


To prove the theorem, we shall prove that provided the constant L of (10.3) is large 
enough, we have 


a <b(T)— + Viogm (10.10) 
Indeed, consider a set F = {t!,..., 1} C T such that 


E sup Y,=E sup Y,; < b(T) —oVJlogm/L. 
teT\User B(s,o) teT\VUe<m B(t*,o) 


The required partition is then obtained by choosing Ay C B(t*,o) for £ < mand 
Am+1 =T \ Wet; a). 
We turn to the proof of (10.10). By definition of a, given F C T with card F < 


m, the rv. 


W:= sup Y, satisfies EW>a. (10.11) 
teT\User B(s,c) 


Moreover, using (10.6) and (6.14) with a(t) = 0, we obtain 
uz 
vu >0, P(\W —EW| > u) < Lexp(-—>). (10.12) 


Let us consider independent copies Yer of the process (Y;);e7 (which are 
also independent of the r.v.s (¢;);>1) and a small number € > 0. First, we consider 
Wi := sup;er Y;| and we select a point t! ¢ T (depending on the r.v.s ¥;!) with 

yi>Wi-e. (10.13) 


Next, we let W2 = supye7\ B(1!,0) Y? and we find 17 ¢ B(t', c) such that 


¥3>Wo-e. (10.14) 


10.1 Latata’s Principle 329 


We proceed in this manner, constructing points t* with vA > Wy — € (where Wy = 


SUP; gsup,_; B(t’,c) Y*) and tk ¢ U,-, Bt, o), until we construct a last point r””. 
The proof of (10.10) will follow from appropriate upper and lower bounds for 
the quantity 


S:= Emax(¥u + Z) . (10.15) 


These bound are themselves obtained in the most natural manner using concentra- 
tion of measure and Sudakov’s minoration. To find a lower bound, we write 


max(Yi + Z,«) => max(W;y + Z,«) — € > min Wy + max Zk —€ , (10.16) 
k<m k<m k<m k<m 


and we proceed to evaluate the expected value of the right-hand side. First, fixing a 


value of k < m and using (10.11) given the points t!, ..., r*~! implies that EW, > 
a, because the process (Ye) is independent of t!,..., t*—!. Using (10.12) given 
y!,..., ¥4—! (andt!, ..., r*—!), we obtain that for u > 0 we have P(|W, —EW;| > 


u) < Lexp(—u?/(Lc)), and proceeding as in (2.123) we get EV < Lc./logm 
where V = maxy<m |We — EWx|. Since W, > EW, — V > a — V, we obtain 


EminW;, > a—LcJ/logm . (10.17) 


k<m 


Next, denoting by Eyc expectation in the r.v.s (€;)jeye only, we prove that 


1 
Eye max Z;« > —o./logm . (10.18) 
k<m L 


For this we observe that for s,t € T with d(s, t) = ||s —t|l2 => o then, using (10.1), 
and assuming without loss of generality L; > 2 in (10.3), 


dye(s,t)” = (53 — 4)" = S81 — 8) — Yi — iY? B 0? — C7 > (0/2). 


i¢gJ i=l ieJ 


Now, by construction, fork, £ < m,k 4 £ we have d(t*, t) > o. We then apply 
Theorem 6.4.1 (the Sudakov minoration for Bernoulli processes) to J‘, and so, using 
also (10.7), (10.18) follows from (6.21). Taking expectation in (10.18) with respect 
to the r.v.s €;,7 € J yields 


1 
Emax Z,« > 77v osm ‘ 


k<m 


Taking expectation in (10.16) and letting « — O we have proved that 


S=a+(2—Le)/logm . (10.19) 
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The next goal is to bound S from above. We first observe that 


i Emax sup(Y* + Z;) . (10.20) 


S™ teT 
Consider then some numbers (a(t));e7. Using (6.14) and (10.6) we obtain 


2 


u 
Vu>0, P( sup + a(t)) — eeu +a(0)| > u) < Lexp(- =) : 


Proceeding as in (2.123) we obtain 
E max | sup(Y* + a(t)) — Esup(¥; + a(t))| < LeJlogm , 
ksm eT teT 


and finally 


E max sup(Y* + a(t)) < Esup(Y; + a(t)) + Le,/logm . (10.21) 
KSm teT teT 


Let us recall that ¥F does not depend on the r.v.s (€)jeye, but only on the r.v.s 
(€;)iey. Thus denoting E,; expectation in the rv.s (€;)jcey only (given the r.v.s 
(€; ies), We May rewrite (10.21) as 


E,; max sup(V* + a(t)) < Ey sup(Y; + a(t)) + Le/logm . (10.22) 


KS teT teT 


Since Z; depends only on the r.v.s (€;)jeye, so that given these r.v.s Z; is just a 
number a(t). Thus (10.22) implies 


E7 max sup(Y* + Z;) < Ey es + Z;)+ Le J/logm . 


k<m teT 


Taking expectation and using (10.9) yields 


E max sup(Y* + Z,) < b(T) + Ley/logm . (10.23) 


K<mM teT 


Combining with (10.20) and (10.19), we obtain 
a+ (+ = Le) Vlogm < b(T)+ Le/logm , 


so thata < b(T) — (o/L2 — Lac)./logm and indeed (10.10) holds true provided 
cx< o/ QA). oO 
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10.2 Philosophy, I 


Let us look at a high level at the work of the previous section. Given a subset J of 
N*, it is a completely natural question to ask when it is true that b(T) is significantly 
larger than b;(T). One obvious way to guarantee this is as follows: for the typical 
value of (€;)jes, the set T’ of t € T for which )°,.; eit) X supper rjc z iti has 
to be such that E sup,<7 >); gs €iti is not very small. This is not the case in general. 
For example, it may happen that all the (¢;);¢, are the same and then b(T) = b;(T). 
More generally, it may happen that the sequence (t;);¢y takes only a few values, and 
then b(T) and b;(T) will be very close to each other. In some precise sense, Latata’s 
principle states that this phenomenon just described is the only possibility for b(T) 
to be about b;(T). Under (10.1) to (10.3) the set T can be decomposed into a few 
pieces A for which either b; (A) is significantly less than b(A) or such that the set of 
(t;))igy is of small diameter (which, under the condition (10.1), takes the form that 
A itself is of small diameter). 


10.3. Chopping Maps and Functionals 


10.3.1 Chopping Maps 


One of the most successful ideas about Bernoulli processes is that of chopping maps. 
The basic idea of chopping maps is to replace the individual r.v.s ¢;x; by a sum 
> ; €i,jXi,j Where ¢€;,; are independent Bernoulli r.v.s and where x;,; are “small 
pieces of x;”. It is then easier to control the °° norm of the new process. This 
control is fundamental to be able to apply the Sudakov minoration (6.21) and its 
consequence Corollary 6.4.10 which are key elements of the proof of the Bernoulli 
conjecture. 

Given u < v € R we define the function gy, as the unique continuous function 
for which @,,,(0) = 0, which is constant for x < u and x > v and has slope | 
between these values, see Fig. 10.1. Thus 


Qu,v(x) = min(v, max(x, u)) — min(v, max(0, u)) . (10.24) 
Consequently 
lgu,v(x)| Suu. (10.25) 
and 
lPu,v(X) — Pu,v(y)| < |x — y| (10.26) 


with equality when u < x,y < v. 
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Fig. 10.1 The graph of ¢,, to the left when u < 0 < v, to the right when 0 < u < v 


It is very useful to note that if uj < u2 <--- < ux, then 


Pur ue) = D> Purueys@) - (10.27) 
1<é<k 


This is simply because both the left-hand side and the right-hand side are continu- 
ous, constant for x < uv; and x > ux, have slope | between these values, and take 
the value 0 at 0. 

Given a finite subset G of R we define 


G :={ueG; vEG, u<v}, 


and for u € G~ we define ut = minfv € G; u < v}, which we will call 
the successor of u. It will always be implicitly assumed that cardG > 2 so that 
G~ # 9. In other words, if we enumerate G = {u1,...,uU,} where the sequence 
(ux)1<k<n 18 increasing, we have G~ = {u1,...,U,—1} and fork < n we have 
up = ux41. To form a mental picture let us say that sets G consisting of a few 
evenly spaced points, say points of the type pa for p € Z, po < p < py (with 
a € R™, po, pi € Z) will be essential although some slightly different situations 
will also be considered. A simple idea is that the family of numbers ¢,, ,+ (x) for 
u € G” gives us good control of the values of x such that minG < x < maxG. 
Different sets G; will be considered to control each of the values of the different 
coordinates ¢; of t in an appropriate range. As a consequence of (10.27), we obtain 


YminG,max G = > Pu,ut - (10.28) 
ueG— 
Lemma 10.3.1 For each x, y € Rand each finite set G, we have 
Ye laut) = Cuut OS |e - yl (10.29) 
ueG— 


Moreover there is equality ifminG < x, y < maxG. 


Proof It suffices to prove (10.29) when x > y. As a consequence of the fact that 
Py,u+ 18 non-decreasing, the absolute values may be removed in the left-hand side, 
and (10.29) is then a consequence of (10.28) and (10.26). oO 
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In particular, since ¢,, ,,+ (0) = 0, we have 


~ Qu ut (x)| < |x|. (10.30) 


ueG— 


In the remainder of this chapter we consider independent Bernoulli r.v.s ¢,,; for 
x € Randi € N*. These are also assumed to be independent of all other Bernoulli 
r.v.s considered, in particular the ¢;. 

Consider now fori > 1 a finite set G; C R. Fort € €2 we consider the rv. 


X(Gi,) = DO cui Guat (ti) . (10.31) 


ueG; 


That is, the value 1; is “chopped” into the potentially smaller pieces 9, ,,+ (ti). 


Exercise 10.3.2 Consider t,t’ € 7. Show that if 4, t; > maxG; or if tj, t/ < 
min G; then X;(G;,i) = X;(G;, i). 


We chop the value of ¢; for all values of i. We write G = (G;)j>1 and we consider 
the rv. 


%O= > UG = >) >) aa (10.32) 


i>1 i=1 yeGr 
L 


Combining (10.30) with the inequality )> a? < (>> |ax|), we obtain that for t € 0? 
andi > 1, we have eee Ou yt (ti) < i so that X;(G) € 7. In this manner to a 
Bernoulli process (X;)rer ‘we associate a new Bernoulli process (X;(G));er. Again 
the idea is that each value ¢; is chopped into little pieces, and we should think of 
G as a parameter giving us the recipe to do that. Another way to look at this is as 
follows: Consider the index set 


J*={G,u); ie N*,uweG>}, (10.33) 

and the map @ : ¢7(N*) > €?(J*) given by 
P(t) = (P(t) j) jes where P(t); = G, ,+ (4) when j = (i, u) . (10.34) 
Then, replacing the process (X;);e7 by the process (X;(G));e7r amounts to replacing 


the set T by the set ®(T). The gain is that we now control ||t||oo fort € &(T). We 
state this crucial fact explicitly. 


Lemma 10.3.3 Fort € ®(T) we have 


IIt loo < max{ut —u; i> lueG}. 
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This is an obvious consequence of (10.25). In order to take advantage of this 
bound, it is efficient to consider sets G; consisting of evenly spaced points, as we 
will do. Lemma 10.3.3 is the purpose of the whole construction: controlling the 
supremum norm allows for use of the Sudakov minoration. Another important idea 
is that according to (10.29) and the inequality ya <® lax|)?, the canonical 
distance dg associated with the new process satisfies 


dg(s,tY = D> Guuts) — Matt < i - hy = dG, 1)’. 
i>lueG, iz 


(10.35) 


The problem is that the reverse inequality is by no means true and that a set can 
very well be of small diameter for dg but not for d. This is in a sense the main 
difficulty in using chopping maps. We shall discover soon how brilliantly Bednorz 
and Latata bypassed this difficulty using Proposition 10.1.1. 

The following is fundamental. It asserts that in a sense the size of the process 
X,(G) is smaller than the size of the original process X;. 


Proposition 10.3.4 For any family G = (Gj)i>1 of finite subsets G; C R and any 
finite set T C €?, we have 


E sup X,(G) < b(T) = Esup ) git; . (10.36) 
teT teT 


i=l 


Proof The families (¢,,;) and (¢;&,,;) have the same distribution, so that 


E sup X(G) = E sup ) Eu,iPuut (ti) 
teT teT, _ 
i=l,ueG; 


= E sup ) €1€ui Pu ut (ti) 
T 
‘ i=l,ueG, 


- E(E. sup y- ei6j (n)) (10.37) 


i>1 


where the function 6; is defined by 6;(x) = euecr €u,iPu,ut (x) and where E, 
means expectation only in (¢;);>1. We note that 6; is a contraction, since 


6:(x) — (YS Y2 laut) = Guu S |x = YI 


ueG; 


by (10.29). The key point is (6.42) which implies 


E, sup ) | €16j (ti) < Esup > eitj =D(T). 


teT i>1 teT i>1 
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Combining with (10.37) finishes the proof. oO 


The following exercise helps explaining the nice behavior of chopping maps with 
respect to the ¢7 and £! norms, which will be used in the next result. There is no 
reason to believe that the constants are optimal; they are just a reasonable choice. 


Exercise 10.3.5 Prove that for x, y € Randc € Rt, we have 


Ix — yPAgx—ypecy + lx — yl1yx—yisey <3 a Ice, ce+1)(*) = Pee,c(e+1)(y)I?- 
leZ 
(10.38) 


and 


> IPce,cet1)(X) — Gee ceern QI? < lx — yIUe—yrecy + €l¥ — YIM fx—-ylee} - 
leZ 
(10.39) 


I invented chopping maps to prove the following version of Sudakov minoration 
which illustrates well their power. Compared with the Gaussian version (2.117) 
of Sudakov minoration, the ball € By are enlarged into « By + Lb(T)B, where By 
denotes the unit ball of ¢'. 


Proposition 10.3.6 There exists a constant L such that for each subset T of €? we 
have, fore > 0 


eV log N(T, € Bz + Lb(T)B,) < Lb(T), 


where N(T, C) is the smallest number of translates of C that can cover T. 


Exercise 10.3.7 will help you understand the formulation of this result and the 
need for the term Lb(T)B. 


Proof Consider c > 0, and the map % : €7 = €?(N*) > ¢?(N* x Z) given 
by Wt) = ((@ec,e+1)c(ti)) Ge), and recall that according to (10.25) we have 
It loo < c fort € Y%(T). It then follows from Sudakov’s minoration for Bernoulli 
processes (6.23) that 


1, e* 
b(W.(T)) > = min (cviog N(W(T), € Bo), —) : (10.40) 
L Cc 


An obvious adaptation of Proposition 10.3.4 implies that b(T) > b(Y%(T)). 
Combining with (10.40) for the value c = €?/(2LD(T)) where L is as in (10.40) we 
get 


b(T) > min (-<v/ig NAT), €Bo), 2b(T)) 
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which implies that 


Lb(T) > €/log N(WA(T), € Bd) . (10.41) 


To conclude the proof it suffices to show that 
N(Y(T), €B2) > N(T, 4€B2 + Lb(T)B)) . (10.42) 


Letting N = N(Y%(T), € Bz), the set %(T) is covered by N balls of the type z + 
€ Bp. It is therefore covered by N balls of the type z + 2€ Bz where now z € Y%(T), 
ie., it is covered by N balls of the type Y%(y) + 2€ Bo. Thus T is covered by N sets 
of the type Wo! (Wey) + 2¢€B2). Keeping in mind the value of c, it is enough to 
show that 


12€? 
Wo! (W.(y) + 2€Br) C y + 4eBz + —Bi . (10.43) 


Consider x with W(x) € Wy) + 2€ Bo. Then the right-hand side of (10.38) is 
< 12e€?, and this inequality implies that (x — y)1qx—-yj<cy € 4€Bo and (x — 
y)Ufx-yzq) € 12€7Bi/c. Writing x — y = (& — y)Ux—yi<cy + @ — y)Uix—-yiz0) 
proves (10.43). oO 


Exercise 10.3.7 Take T = Bj, so that b(T) = 1. Prove that if € +a < 1 it is not 
possible to cover T by finitely many translates of the set € Bz + aBy. 


Exercise 10.3.8 Deduce Proposition 10.3.6 from Theorem 6.2.8. 


10.3.2. Basic Facts 


For i > 1 we consider finite sets G; C G‘. Letting G = (G;)j>1 and G’ = (G‘)j>1, 
we now want to compare the processes (X;(G)); and (X;(G’));. We start by 
comparing the associated distances. We recall the formula (10.35) for the distance 
dg. 


Proposition 10.3.9 


(a) Assume that for a certain integer q 
Vie N*, Wwe G, , card ([u, ut[NGi) <q, (10.44) 
where u* is the successor of u in G;. Then 


dg < Jqdg . (10.45) 
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(b) Assume that 
Vi e N*, minG; = minG; , max G; = maxG; . (10.46) 
Then 
iigi S de. (10.47) 


Proof Throughout the proof we write u an element of G; and ut its successor in 
G;; and v an element of G. and v* its successor in Gi. Thus, for s,t € T we have 


dg(s,t? =) DY) @uut Si) = Pur (10.48) 
i2] weGy 
and 
dos. =D) DY) Gy) — Pty. (10.49) 
iz veG, 


Giveni € N* andu ¢€ G, let us define the set Gj, = G A [u, ut[. The sets 
(Gi,u) yeGgr are disjoint subsets of G.. The union of these sets is G. exactly when 
min G; = min G‘ and max G; = max G‘. 

Next, consider u € G,; C G; and the largest element v of G;,,. Since Gj,y C 
[u, u*[, we have v < ut € G; C G‘. Thus v", the smallest element of G/ which is 
> v satisfies vt < u*. But then vt = ut for otherwise v would not be the largest 
element of G;,,,. It then follows from (10.27) that 


aut (Si) — Put) = D> |@v,v+(Si) — Go,v+ (I - (10.50) 


vEeGin 


tion (10.44) we have card G;,, < qg, we get then 


Thus, using the inequality es iy = A Daksa ap, and since under condi- 


(uu Si) — Puut OY <4 D> Govt (51) — Pr tG)Y » 


vEGinn 


and plugging into (10.48) we obtain 


dg(s.t) <a) Y) DY) Guat i) — Govt)” - (10.51) 


iz ueG; vEGj.y 
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Now 


Yo YS Gti) — G+ GY? < D2 Gort) — Orv (9) 


ueG, YEGinn veG> 


because each term in the double sum on the left is a term of the sum on the right. 
Using this in (10.51) and recalling (10.49), we have proved (10.45). Next, using 
again (10.50) as well as the inequality (>, lax|)? >> ie we obtain 


dg(s.t => D> DE Get) -— Pwr 


i=l ueG, veGiu 


and we have observed that under (10.46) the union of the sets Gj, for u € G, 


is exactly G. so that then right-hand side is exactly dg/(s, t)*, so that we have 
proved (10.47) as well. oO 


Proposition 10.3.10 For i > 1, consider finite sets Gj C G' and let G = (Gj)i>1 
and G’ = (G‘)j>1. Assuming (10.46) we have 


E sup X,(G’) < Esup X,;(G) . (10.52) 
teT teT 


This is a consequence of Proposition 10.3.4. It is quite intuitive, and the only 
difficulty is in the notation. We urge the reader to skip that tedious pensum until she 
has found motivation. 


Proof Consider the set J* = {(i,u) ; i € N*,u € G; } of (10.33) and the map & 
of (10.34). Let T’ = ®(T), so that 


E sup X;(G) = E sup Ss EjSj 5 


teT seT’ jed* 


where €; = &,,; for j = (i,u) € J*. For each j € J* consider a finite subset H; 
of R. Denoting by (e% p xeR, jey* a new sequence of independent Bernoulli r.v.s, it 
follows from Proposition 10.3.4 that 


E sup >> > £5 Pr v+ (Sj) SE ba > ees Pemex (@)s (10.53) 


ser" ied" ve, el” jes* 


We will show that we can choose the sets H; so that the left-hand side is 
E sup,<7 X;(Q’), and this will conclude the proof. 

Let us start with a simple observation. To lighten notation, for u < u’ we set 
O(u, u’) = — min(u’, max(0, u)). We prove that 


usv< vw < = Pv,v'(X) = Pv+6(u,u’),v/+6(u,u’) Pu,u! (%)) : (10.54) 
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First, we observe that (10.24) means that g, (x) = min(u’, max(x, u)) + 0(u, u’). 
Thus, as x increases, the function @,_,/(x) increases until it reaches the value v + 
@(u, u’) for x = v and then the value v’ + 6(u, u’) for x = v’. Next, the function on 
the right-hand side of (10.54) is constant until g,, (x) reaches the value v-+6(u, u’), 
ie., until x = v, then has a slope 1, and then is constant again after @,_ (x) passes 
the value v’ + O(u, uv’), i.e., after x = v’. It is also 0 for x = 0. Thus this function is 
vv’, Which is characterized by these properties, and we have proved (10.54). 
For j = (i, u) € J*, let us define the set 


Hy = (G,N[u,u")}) + 0u,u*), 
so that, recalling the sets G;., = G' N[u, ut[, we have 
HH; = Gi, +9, ut) (10.55) 


when j = (i, u). Using the definition of /* the left-hand side of (10.53) is then 


E SUP ~ 2 ». EY (i,u)Pu,vt (Sli,u)) 


eT’. i 7 
: i=l ueG; veH; ,) 


=E sup > a EY iu) Pv+0(u,ut),vt-+0(u,ut) (S(i,u)) 


/ 
sel” j5] ueG, VEGiu 


= Esup >> S. = ED (iu) Pv+0(u.ut),vt-+6(uut) Pu,ut Gi)) 


(ST i>] yeGr VEG 


=Esp> >) >}. Seats (10.56) 


fer i21 yweG> vEGin 
i 


Here, we use (10.55) in the second line. In the third line, we use the definition 
of T’: 


= (COM jer ;teEeT}= (Guut Gene ueG> ;teT}h. (10.57) 


Finally we use (10.54) in the fourth line. Next, (10.46) ensures that G — 
Uyegr Giwu- Thus the sequence 


* 
(Ey, Gu) icN*,ueG> veGin 


is simply a copy of an independent sequence (€y,i);en*,yeG/-» and the expression on 
the last line of (10.56) equals E sup,er )oj51 oyeg’- €v.iPv,vt (ti). Since this last 
quantity is E sup,<7 X1(Q’), this concludes the proof. Oo 
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10.3.3 Functionals 


We are now ready to define the functionals which we will use to prove Theo- 
rem 6.2.8, but the motivation for these definitions will become only gradually clear. 
These functionals depend on four parameters, two integers k < h € Z, (yes, h 
denotes an integer), a point w € 2, and a subset J of N*. We fix an integer r > 2, 
which will be chosen later on. First, for x € IR and k € Z, we define the set 


G(x,k) ={pr-*; peZ, |pr-*—x| <4r-*}. (10.58) 


If pr-* and p'r—* € G(x, k), then |p — p’| < 8 so that card G(x, k) < 9. We also 
observe that (see Fig. 10.2) 


x —4r—* < minG(x,k) < x —3r-* <x <x43r-* < maxGix,k <x4+4r“*. 


(10.59) 
Next, given k < h € Zand x € R, we define the set 
G(x,k,h) = {pr—" ; pEeZ, minG(x,k) < pr" < max G(x,k)}. (10.60) 


In words, G(x, k, h) consists of about 9 - r’—* points evenly spaced (with a spacing 
of r~") roughly centered on the point x. We should think of k as a scale parameter: 
the length of G(x,k,h) is of order r~*. We should think of h as a “granularity 
parameter”: the distance between consecutive points of G(x, k, h) isr~". We should 
think of x as a location parameter: G(x, k, h) is roughly centered at x. 

Let us note that G(x, k) = G(x, k, k) (so that card G(x, k, k) < 9), that 


min G(x,k,h) = minG(x,k); max G(x,k,h) = maxG(x,k). (10.61) 
Furthermore G(x, k, h) increases with h, in the sense that G(x, k, h) C G(x, k, h’) 


if h < h’, and decreases with k in the sense that if k < k’ then G(x,k’,h) C 
G(x, k, h). 


Fig. 10.2 The set G(x, 1) on top. The spacing between the points is 1/3. The set G(x, 1, 2) on 
bottom. The spacing between the points is 1/9. Here r = 3 and 4/3 < x < 5/3 
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Definition 10.3.11 For a set T C ¢7, integers k < h, a point w € 7, and a subset 
I of N*, we define 


F(T, I,w,k,h) = Esup >> bm £u,iPu,ut (ti) : (10.62) 
te? je] ueG(w;,k,h)— 


We denote by d7,,%,, the corresponding distance 


drwnSit? => > ut (51) — Guu Gi) (10.63) 


i€l ueG(wj,k,h)- 
and A(T, I, w, k, h) the diameter of T for this distance 


A(T, I,w,k,h) = sup drw,k,n(s,t) . (10.64) 


s,teT 


When J = N* the distance d;,1,4,4 is simply the distance dg of (10.35) when 
G; = G(uj, k, h). The effect of the parameter w € £2 is that the set G(u;, k, h) is 
roughly centered around w;. 

Even if we forget to mention it again, when writing these expressions, it is always 
assumed that h > k. 

Let us look at the summation (10.62): decreasing 7 and increasing k decreases the 
number of terms in it. This opens the door to the use of Proposition 10.1.1 (Latata’s 
principle). 

Let us first point out some regularity properties of these functionals. 


Lemma 10.3.12 If I’ Cc I C N*, k' > k andh' = h then 

F(T, 1',w,k’,h’) < F(T, 1,w,k,h) (10.65) 
and 

A(T, I’, w,k',h') < A(T, 1, w,k,h). (10.66) 


Proof That F(T, I, w,k,h) is an increasing function of J follows from Jensen’s 
inequality, by moving now the expectation over the rv.s ¢,; fori € I'\ I 
inside the supremum rather than outside. Next if k < k’ < h then the inequality 
F(T, I,w,k’,h) < F(7,1,w,k,h) follows similarly since G(w;,k’,h)~ C 
G(w;,k, h)~, by moving inside the supremum expectation with respect to the r.v.s 
€y,i foru € G(w;,k,h)~ \ G(u;,k', h)~. That F(T, 7, w,k, h) is a decreasing 
function of h follows from Proposition 10.3.10 and (10.61). The statements 
concerning A(T, I, w,k, h) are easier, using now (10.47). oO 
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Another key idea is that the distances (10.63) associated with the function- 
als (10.62) relate well to the distance considered in (9.12) in Theorem 9.2.4 (in 
the case where v is the counting measure). Our next lemmas provide the main step 
in this direction. 


Lemma 10.3.13 Consider x, y, z € R and assume that |y — x| < 2r—*. Then 


yz? Ar <2 Yo @uut) — Cut)? - (10.67) 
ueG(x,k,h)— 


Proof First we reduce to the case where |y — z| < r~". To do this, we replace z 
by the closest point to z in the interval [y — r~”, y + r~"]. This does not change 
the left-hand side of (10.67) and decreases the right-hand side because the functions 
Yu,ut are non-decreasing. So we assume now |y — z| < r—". Since ly —x| < 2r-* 
we have x — 3r7"' < y,Z2SK+ 3r—" so that (10.59) implies that min G(x, k, h) < 
y,z < max G(x,k, h). Then by the equality case of Lemma 10.3.1, we have 


y—z2l= Sa wt ) — Pant @I - 


ueG(x,k,h)— 


Now, since |y — z| < rh , there are at most two non-zero terms in the right-hand 


side, and (a + b)? < 2(a* +b). Oo 
Lemma 10.3.14 Consider s,t,w € €*. Consider a set I of integers and assume 
that 
Viel, |s; —wj|<2r*. (10.68) 
Then 
Solr G si) PAL <2 dient 8) - (10.69) 


iel 


Proof We use (10.67) for x = w;, y = sj, Z = fj to obtain 


Wisi Ar <2 D0 Guu Ci) — Punt (5i))* 
ueG(w,k,h)— 


We then sum overi € J and we use (10.3.1). oO 
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10.4 Philosophy, II 


In this section we try to shed some light on the construction we have started. Let us 
recall our goal: starting with T C 7, we try to decompose each t € T ast = t! +1? 
is such a way that {t!; t © T} is well behaved and that ||t7||; < Lb(T). For the 
purpose of the philosophical discussions, we will call 7 the peaky part of t, even 
though the name is not really appropriate. 

To prove the required decomposition, we will recursively construct an increasing 
sequence of partitions of T, and then the decomposition of T will be provided by 
Theorem 9.2.4 (used for the counting measure). At each level, for each element 
A of the partition, we will control a certain diameter A(A,/, w,k,h) and the 
corresponding functional F(A, J, w, k, h). The first thought is that in order that the 
functional (10.62) really bear on A, “A should be chosen close to w”. The precise 
meaning of this will be understood later, but for the time being, we keep in mind 
the idea that w provides information about the “location” of A. A second thought 
coming to mind will be that information seems lost when there are coordinates 7 and 
elements ¢ € A with |w; —¢;| > 4r—*, see Exercise 10.3.2. This looks like a serious 
problem: the Latata-Bednorz theorem is absolutely sharp; we can never allow any 
essential information to be lost. The solution to that riddle was given at the end of 
Sect. 9.3: what we really need to keep track of are the values of z,(t);, and this we 
really do as we explain below. 

The details of what happens will be given as the proof develops, but we start 
to reveal some secrets. In order to be able to apply Theorem 9.2.4 (when v is the 
counting measure), we need to have the crucial condition 


WreT,Wn>0, > |r” (t; — aa(t);) |? Al < u2" , (10.70) 
i€Qy(t) 
where’ 
Qt) = {is O<@g <n |ag41()i —AgOil <r} . (10.71) 


Not being precise about (2,,(t) yet, we see that our best shot is to deduce (10.70) 
from (10.69) used for s = z,(t). We then guess that the number / relates to j,(t). 
We also realize the importance of the condition 


Viel, |xn(t); — w;| < 2r7* . (10.72) 


The value of k will be decided later, but we should form the following picture: The 
value of & tells us that the range around w; where we are getting some information 


7 In words, the points of §2,,(t) are those for which the sequence (zt, (t))g<n+1 does not have big 
jumps. 
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on the i-th coordinate of the points of A is about r~*. Most importantly, the value of 
In(t); falls well within this range. It is necessary to keep this range, which might be 
much larger than r~4»™) because we do not have better information on the location 
of z,(t) than (10.72). 


10.5 Latata’s Step 


We now state and prove the key new step in the proof of Theorem 6.2.8 (compared 
with the Gaussian case Theorem 2.10.1). 


Proposition 10.5.1 There exists a constant L2 with the following property. Con- 
sider w, w! € £2, aset I C N*, and integers k < h. Consider a subset T of 7 such 
that 

A(T, I,w,k,h+2)<c. (10.73) 


Assume that for a certain number o 


eS = - rl fogm <o. (10.74) 
2 
Let® 
P={iel; |w;—w}|<2r*}. (10.75) 


Then we can find m' < m+ 1 and a partition (Ag)¢<m of T such that for each 
£ <m we have either 


A(Ag, I,w,k,h+1)<o (10.76) 

or else 
F(Ag, I',w',h+2,h+2) < F(T, 1,w,k,h+— = Vlogm (10.77) 
A(Ag, I',w',h+2,h+2)<c. (10.78) 


In words, the proposition states that each piece produced by the previous 
decomposition is either such that its diameter for the large distance is small 
(when (10.76) holds), or else its size measured by the proper functionals has 
decreased (when (10.77) holds). 


8 The reader should notice that the set I’ is not constructed, but is known once w and w’ are known. 
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The fundamental point of this result is that the hypothesis on T involves a 
control of A(T, /,w,k,h + 2), not of the larger quantity A(T, /,w,k,h + 1), 
whereas the size of the pieces A, in (10.76) involves a control of the larger quantity 
A(Ag, I, w,k,h +1), not of the smaller quantity A(A¢, J, w,k,h +2). 


Proof The proof relies on Latata’s principle, Proposition 10.1.1, but requires some 
skills. There is no loss of generality to assume for notational convenience that J = 
N*. For i € N* consider the sets 


Gj = G(w;, kh +1), 
and G = (G;)j>1. Fori € N* \ I’ let G; = G; and fori ¢€ I’, define G; = 
G; UG(w;,h + 2, h + 2) and define G’ = (G‘);>1. The central object of the proof 
is the process (X;(G’));er. 

First we observe that since r > 2 andh > k, and using (10.59) and (10.61), 
G(wi,h+2,h+2) c [w, —4r??, wi +4r-?7] c [wi — rw er] 
and since |w; — wj| < 2r-* fori € I’, it follows then that 
Giwi, h+ 2h +2) € [wii — Be *, wp + 37] 


Consequently from (10.59), we have 


max G(w}, h 4+2,h+2)<wt 3r—* < max G(w;, k, k) 
= max G(w;,k,h + 1) = maxG; . 


Proceeding similarly for the min shows that the sets G; and G‘ satisfy (10.46). 
Therefore by (10.52) we have 


Esup X,(G’) < Esup X,(G) = F(T, N*, w,k,h+1). (10.79) 
T 


teT te 


Next, since card G(w},h + 2,h +2) < 9 and G; = G; UG(wj,h +2,h +4 2), 
for each i and u € G‘, we have card({u,ut[NG{) < card({u,ut[NGj) + 


. 


i 
oe 


Fig. 10.3 The sets G; when r = 8 and h = k + 1. The bottom of the figure represents the part of 
the set contained in the box after magnification by a factor 8 
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card G(w}, h+2,h+2)<1+9= 10. Thus the sets G; and G' satisfy 
Vie N*, Wu eG; , card ([u,u*[NG;) < 10, 
and Proposition 10.3.9 implies that 
dg <4dg . (10.80) 


We can appreciate the magic of this proof: neither the process (X;(G’))rer nor the 
distance it induces is exactly what we need, but they are related to the quantities of 
interest through the inequalities (10.79) and (10.80) which turn out to be in the right 
direction. 

For i € I’ we have |w; — w;| < 2r—* so that (using that h > k andr > 2) 
—h-2 


—h-2 h-2 


|pr —w;| <4r => |pr- — w;| <2r-* + 4r-*? < 3r-* 
so hence by (10.59) again G(w;,h + 2,h+2) C G(wj,k, h+ 2). In fact the points 
of the left-hand set are consecutive points of the right-hand set. Using (10.45) for 


q = | in the inequality (with J’ instead of N*), we obtain 


i'w! ht 2nt2 S Aw! k,h+2 >» (10.81) 
and this proves (10.78). Consequently? 


A(T, I', w',h+2,h +2) < A(T, I', w,k,h +2) < A(T, N*, w,k,h +2) <c. 
(10.82) 


Let us consider the set J* as in (10.33), where the family G has been replaced 
by the family G’ so that J* = {(i,u);i € N*,u e€ Gi}. Let us consider the 
corresponding map @ as in (10.34). Thus 


s,teT => dg(s,t) = ||P(s) — ®@)|l2 , (10.83) 
where the norm on the right is in £7(J*), and, using (10.79) in the second inequality, 


b(®(T)) = Esup X,(G’) < F(T, N*,w,k,h +1). (10.84) 
teT 


° Please observe what is happening here. The information (10.78) is a consequence of (10.73), not 
of the fact that we have partitioned the set T. This is coherent with our proof of Latata’s principle. 
There is only one piece what satisfies (10.77) and (10.78). This piece is what is left of T after we 
have removed some parts which satisfy (10.76), and we took no action to decrease its diameter in 
any sense. 
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Consider the set J C J* given by 
J={(i,u); ie l',ue Giwji,h+2,h+2)}. (10.85) 


We will use Proposition 10.1.1, replacing the countable set N* by J*. With the 
notation b; of Proposition 10.1.1, for A C T, 


by (®(A)) = F(A, I’, w’,h+2,h +2). (10.86) 


The goal is to apply Proposition 10.1.1 to the set ®(T) with o’ = o/8 instead 
of o and with set of indices J* instead of N*. For this we check (10.1) to (10.3). 
First, (10.82) implies (10.1). Next, (10.2) holds since ||flloo. < r—"—! fort € ®(T) 
and since r~"—! /logm < o = 80’ by (10.74). Finally (10.3), i.e., the condition 
c < o'/L, follows from c < o/L2 provided that Lz = 8L 1. Thus we can apply 
Proposition 10.1.1. We then find a partition (Be)e<m of ®(7) such that for each 
£ < m’, we have either 


ate O(T), Bec Bit’, o/8), (10.87) 
or else 
bj(Be) < b(®(T)) — + Viogm (10.88) 


We then set Ag = ®~!(Be). When By satisfies (10.87), (10.83) implies that 
the diameter of Ag for the distance dg is < o/4, and since dg < 4dg) 
by (10.80), its diameter for the distance dg is < o. This distance is exactly the 
distance used in computing the diameter in (10.76). When By satisfies (10.88) 
then Ag satisfies (10.77) as follows from (10.84) and (10.86), and (10.78) follows 
from (10.82). oO 


10.6 Philosophy, II 


There is a simple idea behind the occurrence of w and w’. We have to think of w 
as providing information about the “location” of T. As we keep splitting the sets 
of our partitions, we keep improving the information about their “location”. The 
change from w to w’ in Proposition 10.5.1 reflects the fact we now have a more 
accurate information than the one we used in the previous steps of the construction: 
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the location of the set T is actually better described using a slightly different point 
w’ than using w.!° 

An intriguing feature is that when we are in the case (10.77), we will obviously 
need to replace J by I’. Why then crucial information is not lost? The answer to that 
lies in the mechanism explained in Sect. 9.3: at the time we will drop the coordinates 
in J \ I’, the decomposition will already have been determined on these coordinates. 
There will be a clever device to ensure that, which we will explain later. 

Another intriguing feature of (10.77) is that on the left we have replaced the value 
k by the potentially much larger value +2. It is absolutely essential to be able to do 
that to ensure that card G(u}, h+2,h-+ 2) remains bounded (in fact < 9) because 
this is how we obtain the inequality (10.80) which is essential to obtain (10.76). But 
why can we afford to do that without losing critical information? At a high level this 
answer is that it is because we have a much better idea of the “localization” of the 
piece A, than of T (for the simple reason that A, is a “small part” of T), but the true 
mechanism is related to the fact that what really matters is the condition (10.72), and 
you have to wait a few more pages until Sect. 10.13 to have it explained in words. 

Finally it cannot hurt to stress again the magic of this proposition, which lies 
in the use of the set (10.85). It is the use of this set which allows to use the weak 
hypothesis (10.73) while reaching the strong conclusion (10.76). 


10.7. A Decomposition Lemma 


Besides Proposition 10.5.1, we need another decomposition principle, very similar 
to what we did in Lemma 2.9.4 in the Gaussian case, which is just a reformulation 
of Lemma 6.6.4 (with a = c/2). Here A denotes the diameter for the é? distance. 


Lemma 10.7.1 There exists a universal constant L3 with the following property. 
Consider a set T C ¢? and b,c > 0. Assume that |\t\|oo < b for all t € T. Consider 
m > 2 with b,logm < c. Then we can find m' < m and a partition (Bg) ¢<m of T 
such that for each £ < m' we have either 


c Cc 
VDC Be; A(D) < L => b(D) < b(T) — zvesm (10.89) 
3 
or else 
A(B)) <c. (10.90) 
We will need the following special case for our construction: 
!0 You can try to visualize things that way: we use one single point w to describe the “position” of 


T. When we break T into small pieces, this gets easier. Furthermore, the position of some of the 
small pieces is better described by an other point than w. 
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Corollary 10.7.2 Consider a set T C €* and w € &*. Consider I C N*, c > 0, 
and integers k < h. Assume that r~" /logm < c. Then we can find m' < m and a 
partition (Ag)e<m' of T such that for each £ < m’ we have either 


VD C Ae: A(D, I, w,k,h) < a? 
F(D,1,w,k,h) < F(T, 1,w,k,h) — — Vlog (10.91) 
or else 
A(Ag, 1,w,k,h) <c. (10.92) 


Proof For notational convenience we assume J = N*. Set G; = G(w;,k, h) and 
consider the set J* and the map @ as in (10.33) and (10.34). Then for a subset A 
of T, we have b(®(A)) = F(A, I, w,k,h). We construct a partition (Be)g<m of 
@(T) using Lemma 10.7.1 with b = r—" and we set Ap = ®! (By). oO 


We can now state and prove the basic tool to construct partitions. 


Lemma 10.7.3 Assuming that r is large enough, r > L, the following holds. 
Consider an integer n > 2. Consider a set T C £2, a pointw € £2, a subset 
I Cc N*, and integers k < j. Then we can findm < Ny and a partition (Ae)e<m 
such that for each € < m, we have either of the following three properties: 


(a) We have 
1 ' 
Dc Ae; A(D,1,w,k, 7 +2) < i ae > 
4 


1 : 
F(D,1,w,k, j +2) < F(T, 1,w,k, j +2)— oo , (10.93) 


or 


(b) 
A(Ae, 1, w,k,j +1) <2%?r-F! (10.94) 


or else 
(c) There exists w' € T such that for I' = {i € 1; |w; — w;| < 2r—*} we have 


1 


F(Ag, I’, w', 7 +2, 7 +2) < F(T, 1,w,k,j+1)——2"r-J71, (10.95) 
j j : 


A(Ag, I’, w', j+2,f4+2)< 2 rs, (10.96) 
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At a high level, this lemma partitions T into not too many sets on which we have 
additional information. In the case (a) the new information is that the subsets of Ag 
of a small diameter have a smaller size (as measured by the appropriate functional). 
In the case (b) it is the set Ag itself which has a small diameter. These two cases 
are very similar to what happens in Lemma 2.9.4. This was to be expected since 
these cases are produced by the application of Lemma 10.7.1 which is very similar 
to Lemma 2.9.4. The really new feature is case (c), where again the size of the set 
Ag has decreased as measured by an appropriate functional, while at the same time 
we control the diameter of Ag (but for a much smaller distance than in case (b)). 
What is harder to visualize (but is absolutely essential) is the precise choice of the 
parameters in the distances and the functionals involved. It is absolutely essential 
that the condition on D in (10.93) bears on A(D, J, w,k, j + 2), whereas the 
condition on Ag in (10.94) bears on A(Ag, J, w,k, 7 + 1). Let us also note that 
in the case (c) we have in particular, using (10.65) (i.e., the monotonicity in h and k) 


1 ‘i 
F(Ag,I',w', j+2,j+2)< F(T, Lw,k, j)—- oo (10.97) 


Proof The principle of the proof is to apply first Corollary 10.7.2 and then to split 
again the resulting pieces of small diameter using Proposition 10.5.1. 

Let us define m = N,_1 — 1. Since we assume n > 2, we have Pler a < 
Jlogm < 2”/*, Let us set c = 2”/*r-J—!/L> so that c /logm > 2"r-J-!/L 
and r-J~?,/flogm < r—/~?2"/2 < Loc/r. Assuming r > L» we then have 
r—J-2, flogm <c. 

Let us recall the constant L3 of Corollary 10.7.2. We then apply this corollary 
with these values of k,c,m, and with h = j + 2.!' This produces pieces (Ce) ¢<m’ 
with m’ < m which satisfy either 


VDC Cy; A(D,I1,w,k, j +2) < = = 
F(D, 1, w,k, j +2) < F(,1,w,k, 7 +2)- — Viogm (10.98) 
or else 
A(Co, I, w,k, j +2) <c. (10.99) 


Let us set L4 = J2L2L3, so that 


= z 20H! (10.100) 
3 4 


'l The real reason to use j + 2 rather than j + 1 will become apparent only later. In short, it is 
because in (10.98), we absolutely need to have a condition bearing on A(D, J, w, k, 7 +2), not on 
the larger quantity A(D, J, w,k, 7 + 1). 
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Thus the pieces C¢ of the partition which satisfy (10.98) also satisfy (10.93). We are 
done with these pieces. 

The other pieces C, of the partition satisfy (10.99), that is, they satisfy (10.73) for 
h = j. Let us fix w’ € T. We split again these pieces Cy into pieces (C¢,¢) ¢<m+1 
using Proposition 10.5.1, with these values of k,m, withh = j,o = gnl2yp—jrl 
c = 2"/?7-J-!/Ly = 0/Ly. Each of the resulting pieces Ce. satisfies either 


A(Cee,1,w,k,j+i<o =F! , (10.101) 


and then we are in the case (b), or else (using that o./logm > 2"r—/—!/L) they 
satisfy 


1 , 
F(Coe, ',w', fj +2,7+2) < FU, T,w,k, f+) —- ore ; (10.102) 


MCT w+) eer, (10.103) 


and then we are in the case (c). 
Finally, the total number of pieces produced is < m(m + 1) < Nos =N,. OO 


10.8 Building the Partitions 


We will prove the basic partitioning result by iterating Lemma 10.7.3. A remarkable 
new feature of this construction is that the functionals we use depend on the set we 
partition. We recall the constant L4 of Lemma 10.7.3. We fix an integer « > 3 with 
2*/2 > 214, and we set r = 2“ (so that r is now a universal constant > 8). 
Consider a set T C ¢* with 0 € T. We plan to construct by induction over 
n > O an increasing sequence (A,,) of partitions of T, with card. A, < N,. To 
each A € A, we will attach a set J,(A) C N*, a point w,(A) € £2, and integers 
kn(A) < jn(A) € Z,0 < pn(A) < 4« — 1. We are going to explain soon the 
meaning of these quantities and in particular of the integer p,, (A). Let us right away 
introduce some basic notation. For n > 0, A € A,, and D C T, we define 


An,A(D) := ACD, In(A), wn(A), kn(A), jn(A)) - (10.104) 
We will use this quantity to measure the size of the subsets of A. The following is 


obvious from the definition (10.104): 


Lemma 10.8.1 Assume that B € A, and A € Ayyy and In4,(A) = 
Tn (B), Wn41(A) = wn(B), kn41(A) = kn(B), jng1 (A) = jn(B). Then for D C T 
we have Ay, p(D) = An+1,a(D). 
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To start the construction, we set n9 = 2. Forn < no := 2 we set A, = {T}, 
I(T) = N*, wa(T) = 0, pr(T) = O and ky, (T) = jn(T) = jo, where jo satisfies 
A(T) <r7%, 

For n > no = 2 we will spell out rules by which we split an element B € A, 
into elements of A,+1 and how we attach the various quantities above to each newly 
formed element of A,+1. We will also show that certain relations are inductively 
satisfied. Two such conditions are absolutely central and bear on a certain diameter 
of A: 


VA € An, pn(A) =0 => An a(A) < 27/7 ro A | (10.105) 

VA € An, pn(A) > 0 => Ana(A) < 2% PH ADP At? | (10.106) 

Let us observe that (10.106) gets more restrictive as p,(A) increases and that for 

the small values of p,(A) (e.g., pn(A) = 1), this condition is very much weaker 
than (10.105) because of the extra factor r”. 

When p,(B) => 1, observe first that from (10.106) we have 
An,p(B) < 21— Pn(B))/2,.— jn(B)+2 : (10.107) 
The rule for splitting B in that case is simple: we don’t. We decide that B € Ani, 


and we set Tn41(B) = 1,(B), Wn+1(B) = wn(B), kn+1(B) = k,(B), Jn+1(B) = 
Jn(B). For further reference, let us state 


Pn(B) > 0 => Be Anti, int (B) = jn(B) - (10.108) 


To define py+1(B), we proceed as follows: 


° If py(B) < 4« — 1 we set py41(B) = p(B) +1. 
© If py(B) = 4x — 1 we set pp+1(B) = 0. 


When pn+i(B) = pn(B) +1 > 0, we have to prove that B € Ani 
satisfies (10.106), that is 
An+1 B(B) < Qt l~ pnt (B))/27.— jn+1 (B) +2 ; 
This follows from (10.107), Lemma 10.8.1, and the fact that (n + 1) — pn41(B) = 
n — Pn(B). 
When py+1(B) = 0 we have to prove that B € A,+ satisfies (10.105). Using 


Lemma 10.8.1 in the equality and (10.107) in the inequality, recalling that p,(B) = 
4x —1, 


An+1.B(B) = An. p(B) < gre tD)/2,—jn(B)+2 _ 9@+1)/2,— inti (B) 
(10.109) 


since 2-2 = r7?, 
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The integer p,(B) is a kind of counter. When p,(B) > 0, this tells us that we 
are not permitted to split B, and we increment the counter, ppii(B) = py(B) + 1 
unless py(B) = 4x — 1, in which case we set py+1(B) = 0, which means we will 
split the set at the next step. More generally, the value of the counter tells us in how 
many steps we will split B: we will split B in 4x — p,(B) steps. 

Let us now examine the main case, p,(B) = 0. In that case we split B in at 
most N;, pieces using Lemma 10.7.3, with J = [,(B), w = w,(B), j = jn(B), and 
k = k,,(B). There are three cases to consider. 


(a) We are in case (a) of Lemma 10.7.3; the piece A produced has property (10.93). 
We define py+1(A) = 0. We then set 
Ins (A) = In(B), wn41(A) = wn(B) , 
Jnt (A) = jn(B) , kn41(A) = kn (B). (10.110) 
(b) We are in case (b) of Lemma 10.7.3, and the piece A we produce has 


property (10.94). We then set pyi1(A) = 0, jn41(A) = jn(B) + 1, and we 
define 


In+1(A) = In(B) , Wn41(A) = wn(B) , kn41(A) = kn (B) - (10.111) 


(c) We are in case (c) of Lemma 10.7.3; the piece A produced has properties (10.95) 
and (10.96). We set pn41(A) = 1, and we define 


int 1 (A) = Kn41(A) = jn(B) $2s 


We define w,41(A) = w’ € B and 
Ing (A) = {i € In(B) 5 |wng1(A)i — Wn (B)il <2}, (10.112) 


so that in particular J,41;(A) C In(B). 


In order to try to make sense of this, let us start with some very simple 
observations. We consider B € A, withn > Oand A € Ani, AC B. 


¢ In cases (a) and (b), we do not change the value of the counter: pyii(A) = 
Pn(B) = 0. Only in case (c) do we change this value, by setting pn41(A) = 1. 
This has the effect that the piece A will not be split in the next 4« — | steps, but 
will be split again exactly 4« steps from now. 

e There is a simple relation between j,41(A) and j,(B). It should be obvious from 
our construction that the following conditions hold: 


Jn(B) < jng1 (A) < jn(B) +2. (10.113) 


Pn+i(A) = 0 => jnti(A) S in(Bt+ 1. (10.114) 


Punt t(A) = 15> jnti(A) = jn(B) +2. (10.115) 
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¢ Itis also obvious by construction that “k, 7, w did not change from step n to step 
n+ 1 unless py41(A) = 1”: 
Pn+i(A) #1 => kn (A) = kn (B) 5 
Inti (A) = In(B) > wn4i1(A) = wn(B) . (10.116) 


¢ The possibility that py1(A) => 2 only arises from the case p,(B) > 1, so we 
then have by construction 


Pn+i(A) = 2 => pa(B) = pn(A)—1. (10.117) 
Next we show that our construction satisfies the crucial conditions (10.105) 
and (10.106). 
Lemma 10.8.2 Conditions (10.105) and (10.106) hold for each n. 


Proof The proof goes by induction over n. We perform the induction step from n 
ton + 1, keeping the notation A C B, B € An, A € Ani. We distinguish cases. 


e We are in case (a) of Lemma 10.7.3; the piece A produced has property (10.93) 
and pn+1(A) = 0. Using (10.110) and Lemma 10.8.1 and since A C B we have 
An+1,A(A) = An,B(A) < An,p(B), so that (10.105) is satisfied for A andn + 1 
because it was satisfied for B and n. 

e We are in case (b) of Lemma 10.7.3, the piece A we produce has property (10.94) 
and Pn4i(A) = 0. Since jn41(A) = jn(B) + 1 the condition (10.94) means 
exactly that An41,4(A) < 2"/2r—Jn+1) 50 that A satisfies (10.105). 

e We are in case (c) of Lemma 10.7.3, the piece A produced has properties (10.95) 
and (10.96) and ppii(A) = 1. Then (10.96) means that Ayn+1,4(A) < 
2n/27—Jn(B) | so that 


Ansi.a(A) < 20/2p-dn(BY = 91 Png t(A)/2p— ing AF2 


since jn41(A) = jn(B) + 2. Thus condition (10.106) holds for A. oO 


Let us explore more properties of the construction. For n > 0 and B € A, and 
D CT, let us define 


Fy,p(D) := F(D, In(B), wn(B), kn (B), jn(B)) - (10.118) 
Lemma 10.8.3 For any n > 0 when pni1(A) = | we have 
1 : 
Fn41,A(A) < Fn.a(B) — eae ; (10.119) 


Wnti(A) EB, (10.120) 
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In-1(A) = {i € In(B) s [wn41(A)i — wn(B)i| < 2} (10.121) 


Proof The only possibility that pn1(A) = 1 is when we are in the case (c) above, 
ie., A is created by the case (c) of Lemma 10.7.3, and then A has property (10.97) 
which translates as (10.119). The other two properties hold by construction. oO 


Let us now introduce new notation. Forn > 1, B € An, D C T we define 
Ay p(D) := A(D, In(B), Wn(B), kn(B), jn(B) +2) , (10.122) 
Fy p(D) = F(D, In(B), Wn(B), kn(B), jn(B) +2) , (10.123) 


and we learn to distinguish these quantities from those occurring in (10.104) 
and (10.118): here we have j,(B) + 2 rather than j,(B). 


Lemma 10.8.4 Consider B € A, and A € Anyi, A C B. Ifn > 2 and if 
Pn+1(A) = 0, either we have p,(B) = 4« — 1 or jn41(A) = jn(B) + | or else we 
have 


1 
DCA, A* 1 4(D) a Te ew = 
4 


F*. 4(D) < Fi, 4(A)— Fat instal (10.124) 
Proof We may assume that py+1(A) = 0, pn(B) 4 4«—1, and jn4i (A) 4 jn(B)+ 
1. The set A has been produced by splitting B. There are three possibilities, as 
described on page 353. The possibility (b) is ruled out because jn41(A) 4 jn(B) + 
1. The possibility (c) is ruled out because pp+1(A) = 0. So there remains only 
possibility (a), that is, A has been created by the case (a) of Lemma 10.7.3, and 
then (10.93) implies (10.124). Oo 


Let us also observe another important property of the previous construction. If 
Be An, A € Anz, A C B, then 


Fn41,A(A) < Fn,p(B) . (10.125) 


Indeed, if py+1(A) 4 1 this follows from (10.113), (10.116), and (10.65), and if 
Pn+i(A) = 1, this is a consequence of (10.119). 


10.9 Philosophy, IV 


Let us stress some features of the previous construction. At a high level, cases 
(a) and (b) are just as in the Gaussian case. In these cases we do not change 
T,(B), wn(B), kn (B) when going fromn ton+1. We split B into sets A which either 


356 10 Proof of the Bernoulli Conjecture 


have the property the a small D subset of A has a small functional (as is precisely 
stated in (10.94)) or which are such that “A is of small diameter”. But the devil 
is in the fine print. “A is of small diameter” is not what you would obtain directly 
from Lemma 10.7.1, a control of A(A, [n41(A), Wn41(A), kn41(A), jing 1 (A) + 2). 
It is the much stronger control of A(A, In41(A), Wn41(A), kn41(A), jn41 (A) +1). 
This stronger control of the diameter of A is essential to make the proof work and 
is permitted by a further splitting using Latata’s principle. 

The cost of using Latata’s principle is that now we get a new case, (c). I like 
to think of this case as a really new start. We reset the values of k,,;(A) and 
Wn+1(A), and we therefore lose a lot of the information we had gathered before. 
But the fundamental thing which happens in that case is that we have decreased the 
size of the set, as expressed by (10.119). 

The counter p,(A) is not important; it is an artifact of the proof, just a way to 
slow down matters after we have been in case (c) so that they move at the same 
speed as in the other cases (instead of introducing more complicated notation). 


10.10 The Key Inequality 


Given t € T andn => 0, define then 
JM) := jn(An(t)) , 
where as usual A,,(t) is the element of A, which contains t. The fundamental 


property of the previous construction is as follows. It opens the door to the use 
of Theorem 9.2.4, since it controls the main quantity occurring there. 


Proposition 10.10.1 We have 


WweT, Ye < L(ir-” + D(T)). (10.126) 


n>0 
We set 
a(n) = 22,-I@M = 22 Jn (An(t)) ; 


Let us first observe that since j(n) = j (0) forn < 2 we have ae a(n) < Lr7~i® 
so that it suffices to bound aes a(n). Let us define 


F(n) := Fy.a,(t)(An(t)) = 0, 
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where the functional F,,4 has been defined in (10.118). As a consequence 
of (10.125) the sequence (F(n))n>o is non-increasing, and of course F(0) < 
b(T).!2 

Let us recall the definition no = 2 and set 


Jo = {no} U{n > no; jn+)> jm}, 
which we enumerate as Jo = {0,71,...}. Since nx € Jo we have j(nz, + 1) > 


Jj (nx). By (10.113) we have j (ng + 1) < j(ng) + 2. Also, forng + 1 <n < nest 
we have j(n + 1) = j(n), so that 


i ee (10.127) 


Taking for granted that the sequence (a(n)) is bounded (which we will show 
at the very end of the proof), and observing that a(n + 1) = 2a(n) forn ¢ Jo, 
Lemma 2.9.5 used for a = 2 implies that pee i) < Ly ee) = 
LY ys ang). Let us set 


C* = {k>0; VR = 0, alng) = 2" a(ny)} | 
Using the Lemma2.9.5 again implies that 


Y- a(n) ai > a(n) . (10.128) 


k>=0 kec* 


Thus, our task is to bound this later sum. In the next section, you will find some 
words trying to explain the structure of the proof, although they may not make sense 
before one had at least a cursory look at the arguments. 

A good part of the argument is contained in the following fact, where we use the 
notation p(n) := Pn(An(t)): 


Lemma 10.10.2 Consider k € C* with k > 1 and assume that 
ne-l<m<n* :=ngait+1 > pm =0. (10.129) 
Then 
a(nk) < L(F (ng) — F(mg42)) - (10.130) 
Proof The proof is very close to the proof of (2.94) which should be reviewed now. 


It will be deduced from the key property (10.124). A crucial fact here is that in the 
definition (10.122) of Ar 41. a(A), we have jn+1(A) + 2 (and not jn+1(A)). 


Me Actually we have F(0) = b(T). 
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Let us fix k and set nm = ng — 1. The reader must keep this notation in mind 
at all time, and remember in particular that n + 1 = nx. We first prove that for 
A = An+i(t) = An, (t), 10.124) holds, ie. 


DCA, At,, 4(D) < 2002p ns x 
4 
1 . 
F* 1 4(D) < Fi, 4(A)— oe : (10.131) 


Since p(n + 1) = 0, Lemma 10.8.4 states that there are three possibilities: either 
p(n) = 4x — Ll orelse j(n) < j(n + 1) or else (10.124) holds. We now rule out 
the first two possibilities. The first one is ruled out by (10.129), which asserts in 
particular that p(ng, — 1) = p(n) = 0. To rule out the second one, we assume for 
contradiction that j(n) = j(ng — 1) < j(mx) = j(nt+ 1). Then ng — | € Jo so that 
ng — 1 = n_1. But since k € C*, we have a(ng_1) < 2a(ng), ie., ri MC -IM-D < 
gneti—n—-1 = 4, Since J(x) — j(@x-1) = 1, this is a contradiction since r > 8, 
which proves that j(n) = j(n + 1). 

Thus we have proved (10.131), which is the crux of the proof. 

It follows from (10.114) that for any m € Jo, we have j(m+ 1) = j(m) +1 
when p(m + 1) = O. In particular (10.129) implies that this is the case form = nx 
and m = ng+1 So that, using also (10.127) in the third equality, 


JO*) = fp t+) = fer +1 = f+ D+1= je) +2, (10.132) 
ie. 
Jn* (Ane (t)) = Jing (An) +2. (10.133) 
Furthermore (10.116) implies 


Wy* (Ans (t)) = Wrz (An, (t)) 5 Inx (An (t)) = Ing (An, (t)) ; 
knx (Ane(t)) = ky, (An, (t)). (10.134) 


We will prove later that 


1 : 1 : 
A* 1 4(An*(t)) eS lee ieee _ ec ; 
(10.135) 


For the time being, we assume that (10.135) holds, and we show how to conclude 


the proof of the lemma. Recalling that nz, = n + 1, so that j,41(A) = jn, (A) = 
Jing (An, (t)) = j (ng) we use (10.131) to obtain 


1 : 
Fe 4(Ant()) < FX 4 (An) — pr (m)—1 (10.136) 
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Using the monotonicity of (10.65) of F in the parameter j in the inequality, we 
obtain 


Fy, AVAn (0) (10.137) 
= F(An, (O), In (An (1), Wry (An OD), King (Ane), J 2K) + 2) 
< F(An(), In, (An), Wn (Ang): kn (An, O), Jk) = F(ne) - 


Using (10.134) and in an absolutely crucial manner that j(n*) = j(ng) + 2 
by (10.132), we obtain 


Fe 4 (Ans(t)) (10.138) 


= F(An*(t), Ing (Any 0), Wry (Ang), king (An), J tk) + 2) 
= F(An*(t), Ine (An*(t)), Wn (An (1), kn* (Ant (1), Jn") = F(n*) . 


Thus (10.136) implies 
| ee 
F(n*) < F(ng) — ad j(r)—1 


ie., a(ng) < L(F (ng) — F(n*)) < LF (ng) — F(ng42)) by (10.125), and this 
concludes the proof of the lemma. 
We turn to the proof of (10.135). Using (10.132) and (10.134), we first obtain 


Ana1,a(An (2) 

= A(An=(t), In, (An, (0), Wry (Ang 1), ky (Ang), J (rk) + 2) 

= A(An*(t), Inx(An*(t)), Wns (Ant (t)), kn*(An«(t)), j(0*))) 

= An*,Ayx(t)(An* (1) - (10.139) 


Here also it is crucial that j(n*) = j (nx) + 2. Then we use (10.105) for n* instead 
of n and A = A, +(t) to obtain 


Ant,Ans(t)(Ane(t)) < 20/27 -FO") (10.140) 
By the definition of C*, we have a(ng) => a(ng41)/2 Le. 
Qk p— IM) > Qe p—J M+) 12 ; 
and thus, using again that j (ng41) = j(mg) + 1, 


QMk+I—Mk < Op = QKT! 
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and therefore ng41 —ne < «+1, 80 that n* =n, +1 < ng ae 2, and since 
j(n*) = j(ng) + 2, we have 27° /2p-J@) << 22+4)/29nK/2-—i(")—2_ Using that 
2*/? > 214 andr = 2", so that 2@+)/2p—! = 2!-*/? < 1/L4, we get 


2/2, FO) << y2+O/2qM/2,-F4)-2 eh (10.141) 
4 


Comparing with (10.140) and (10.139) yields the desired inequality (10.135). oO 


Corollary 10.10.3 Consider the subset C of C* consisting of the integers k > 1, 
k € C* for which (10.129) holds. Then 


Y > a(nx) < LD(T). (10.142) 
keC 


Proof This follows from the usual “telescoping sum” argument, together with the 
fact that F(n) < Lb(0). oO 


So, we have now reduced the task to controlling a(nx) when k € C* \ C. We start 
by a simple observation. 


Lemma 10.10.4 We have 
p(m) > 0=> jim) = j(m+ 1) (10.143) 
Vk; p(nz) =0. (10.144) 


Proof Condition (10.143) holds by construction; see (10.108). Condition (10.144) 
is a corollary, since j(mx) < j(me + 1). oO 


The following lemma gives us precise information on these integers k € C* \ as 


Lemma 10.10.5 I[ffor a certain k > 0 we have p(ng—1) = p(nk +1) = p(nepi + 
1)=Othenk EC. 


Proof We have to prove that p(m) = 0 forny —1 < m < ngy; + 1. Assume 
for contradiction that there exists nx — 1 < m < ng41 + 1 with p(m) > 0, and 
consider the smallest such m. Then certainly ng +1 <m < ng+, since p(nz — 1) = 
p(ng + 1) = 0 by hypothesis and since p(nx) = p(ng+1) = 0 by (10.108). Next 
we prove that p(m) = 1. Indeed otherwise p(m) > 2 and by (10.117) we have 
p(m — 1) = p(m) — 1 = 1 which contradicts the minimality of m. Thus p(m) = 1. 
But by construction when p(m) = 1 then j(m) = j(m — 1) + 2 (see (10.115)) so 
that m — 1 € Jo. But since ng < m— 1 < ng41, this contradicts the definition of 
nk+1 Which is the smallest element of Jo larger than nx. oO 


Corollary 10.10.6 Fork € C* \ C, we have either k = 0 or p(n — 1) > Oor 
ping +1) = lor pig t+) =1. 
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Proof According to Lemma 10.10.5 if k > 1 does not belong to C then on of the 
quantities p(mg — 1), ping + 1), p(Me41 + 1) is > 0. Now, since p(ng) = O and 
p(n+ 1) < p(n) + 1 we have p(ng + 1) < 1. oO 


The goal now is to produce specific arguments to control a(n) in the various 
situations which can happen when k € C* \ C, as brought forward by the previous 
corollary 


Lemma 10.10.7 Let J) = {n >0; p(n+1) = 1}. Then 


a(n) < LD(T) . (10.145) 


néeJ, 


Proof Indeed (10.119) implies that forn € J), 
a(n) < L(F(n)— F(n+1)), 


and the telescopic sum is bounded by LF (0) < Lb(T). oO 
Corollary 10.10.8 Let C; = {k € C*; ping +1) = 1} = {hk E C*; ne € Ji}. 
Then 


Y> a(n) < Lb(T). (10.146) 
keCy 


Lemma 10.10.9 Let Co := {k > 0; ng41 € Ji}. Then 


S> a(n) < Lb(T). (10.147) 
keEC2 


Proof We have 

a(ng) = 2WMk pI Mk) < peQitktl pF M+) r?a(nk+1) ; 
where we have used in the first inequality that j(mz4.1) < j(mx) + 2 by (10.113), 
and therefore (10.146) implies the result. oO 
Lemma 10.10.10 Let C3 := {k > 0; p(nx — 1) > O}. Then 


> a(n) < LB(T) . (10.148) 
keEC3 


Proof Let us recall that by construction p(n + 1) = p(n) + 1 when 1 < p(n) < 
4x — 2. Consequently the only possibility to have p(n) > O and p(n + 1) = 0 
is to have p(n) = 4x — 1. Also, since j(nz + 1) > j(ng), by construction we 
have p(nz) = 0. Thus for k € C3 we have p(nz,) = 0 and p(n, — 1) > 0 s0 
that p(n, — 1) = 4x — 1, and then p(n, — 4« + 1) = llie, ng —4k © Jj. 
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Iteration of the relation a(n) < 2a(n — 1) shows that a(nz) < 2**a(ng — 4k). 
Thus )opec, (mk) < L pec, a(n — 4) < Vey, a(n), and the result follows 
from (10.145). oO 


The following is a consequence of Corollary 10.10.6 and the subsequent lemmas: 


Proposition 10.10.11 We have 


SY > a(n) < Lr” + BCT). (10.149) 
keC* 


Proof of Proposition 10.10.1 It follows by combining (10.128) and (10.149). But 
it remains to prove that the sequence (a(n)) is bounded. Using a much simplified 
version of the previous arguments, we prove that in fact 


Vn , a(n) < L(r-” + B(T)) . (10.150) 


By (10.145) this is true ifn € J). Next we recall that if p(n) > 2 then p(n — 1) = 
p(n) — 1. Consequently n € Jp := {n; p(n — 1) + p(n) > O}, there exists n’ € J 
with n’ > n — 4k. Also, since a(m + 1) < 2a(m), we have a(n) < La(n’), and 
we have shown that (10.150) holds for n € Jy. Next we show that it also holds for 
n € J3 := {n > 3; jv — 1) = jf), p(@ — 1) = p(n) = O}. Indeed in that case, 
we use Lemma 10.8.4 for n — | rather than n. Since p(n) = p(n — 1) = 0, we are 
in the third case of the lemma, and (10.124) holds. Taking D reduced to a single 
point proves the required equality a(n) < LF (ng). Now, ifn > 3 andn ¢ Jn U J3, 
we have j(m) > j(n — 1), and since r > 4, we have a(n — 1) > a(n). So to 
prove (10.150), we may replace n by n — 1. Iteration of the procedure until we reach 
a point of Jz U J3 U {1, 2} concludes the argument. oO 


10.11 Philosophy, V 


In this section we try to describe at a high level some features of the previous proof. 
When in that proof j(nz, + 1) > j(ng), according to (10.114) and (10.115), there 
are two cases. First, it might happen that p(n, + 1) = l and j(ng +1) = j (mg) +2. 
Second, it might happen that p(n, + 1) = O and j(n, + 1) = j(nx) + 1. Let 
us think of the first case as an exceptional case. This exceptional case is a good 
thing because we then have no problem to control a(n,;) thanks to (10.145). As 
expressed in (10.147) and (10.148), for semi-trivial reasons, we have no problem 
either to control a(ng) when p(ny — 1) = 1 or p(ng41 + 1) = I, so that, so to 
speak, the problem is to control the a(n,) in the special case where there is no 
exceptional value k’ near k. This is what Lemma 10.10.2 does. In that lemma we 
get by construction the information (10.131) on the small subsets D of A = An, (t). 
The idea is to use that information on a set D = A,,(t). For this we need to control 
the diameter of A, (t). We should think of this diameter as governed by Qn /27-F@’) 
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For this to be small enough, we need j(n’) > j(nx) + 2. The smallest value of n’ 
for which this happens is n* = nz, + 1. An important feature of the argument is 
that our bound on the size of An,,,41 is smaller by a factor 2/r than the bound we 
had on the size of Ay,,, (which itself is not so much larger than the bound we had 
on the size of Ay, thanks to the use of Lemma 2.9.5). In this manner we obtain the 
control a(nx) < L(F (nx) — F(n*)). 


10.12 Proof of the Latata-Bednorz Theorem 


Without loss of generality, we assume 0 € T. First we use Lemma 6.3.6 to find jo 
such that A(T, d2) < 2r7/0 < Lb(T) so that in particular |t;| < r—Jo/2 fort ¢ T 
andi € N. We then build a sequence of partitions as in Sect. 10.8, using this value 
of jo. Then (10.126) yields 


sup ) 24h) < Eb(T), (10.151) 
teT 


n=0 

The plan is to produce the required decomposition of T using Theorem 9.2.4 for 
92 = N and yw the counting measure (and using also (10.151)). To apply this 
theorem, given n and given A € A, we will define an elements z,(A) € T. We 
will then define z(t) = 2,(An(t)). However, in contrast with what happened in 
many previous proofs, we will not require that 2,(A) € A. It could be helpful to 
recall (9.9) and (9.10), which are the most stringent of the conditions we require on 
Tey (t): 


VteT, Wn=0, In(@) = In4i(t) > Tn (t) = Tn+1(t) , (9.9) 
WteET , Vn =O, jngt(t) > jn(t) > anti) € An(t) . (9.10) 


The construction of the points zr, (A) proceeds as follows. We choose zo(T) = 0. 
Consider A € An+1 and A C B, B € An. According to (10.113) there are three 
disjoint cases which cover all possibilities: 


© jn+1(A) = jn(B). We then set 2n41(A) = mn (B). 

© jnt+1(A) = jn(B) + 1. We take for 741(A) any point of B. 

© jn+1(A) = jn(B) +2. According to (10.114) we then have pn+1(A) = 1, so that 
we are in the case (c) considered on page 353. We set 241(A) = wn+1(A) so 
that 741(A) € B using (10.120). 


The important property (which obviously holds by construction) is that 
Jnt (A) = jn(B) = n41(A) = mn (B) (10.152) 


Int (A) > jn(B) => mn41 (A) € B. (10.153) 
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Defining 7,(t) = m,(An(t)), this implies that (9.9) and (9.10) hold, while (9.7) 
is obvious by construction. Also, according to (10.114) and (10.115), we have 
Jn+1 (A) = 1 if and only if jn+1(A) = jn(B), so that 
Put (A) = 1 gi (A) = wag (A). (10.154) 
Let us consider the set 
Rat) = {i EN*; Va <n, |ag(t)i — mei (il <7 4} . (10.155) 


The key of the argument is to establish the inequality 


WteT,Wn>0, = lr; — aa(t)i) [2 Al < L2”. (10.156) 
i€Qy(t) 


This inequality means that (9.12) holds for u = L. We can then apply Theorem 9.2.4 


to obtain the required decomposition of 7, since 73 = {0} by the choice of jo. 
We start the preparations for the proof of (10.156). Let us define 


kn(t) = kn(An(t)) + Wn(t) = WnlAn(t)) > Pat) = Pn(An(t)) - 
Then by (10.116) we have 
Datilt) #1 > wasilt) = walt); kgriO = kg), (10.157) 
and (10.154) implies 
Pat i(t) = 1 > mq41(t) = wa4ilt) . (10.158) 
Also, since k,(A) < jn(A) for A € An, we have 
kn(t) S jn(t) - (10.159) 
Our next goal is to prove the inequality 
i € Lng 10) > btn Oi — Wail <2 (10.160) 
To prepare for the proof, let J’ = {0} U{n’; py (t) = 1}. Givenn > 0 let us consider 
the largest n’ € J’ with n’ < n. Then by definition of n’ for n’ < q <n we have 
Pqtilt) # 1, so that by (10.157) we have wg41(t) = wg (t) and kg+i(t) = k(t). 


Consequently we have 


Wn(t) = Wy (t) 3 ky(t) = Ky (ht) . (10.161) 
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We prove next that z(t) = wy,’(t). If n’ = 0 this is true because we defined 
wo(T) = mo(T) = 0. If n’ > 0, by (10.158) we have m,/(t) = wpy(t), and 
recalling (10.161) we have proved that z(t) = wy,’ (t) as desired. 

We observe next that by (10.159) we have ki (ty < jn’ (t). Recalling (10.161), to 


prove (10.160) it suffices to prove that |7n+41(t); — Ty (1)i| < 2r—i ©, We write 


ln: — tls YS Iq @i — t_gMil = Do lrg Oi — mq il . 


n'<q<n iceU 


(10.162) 


where 
U={q; n<q<n, mgtilt)— x(t) #0}. 


Now, recalling (10.155), since i € Qn41(t) forg € U we have |74+1(t)i—q(t)i| < 
r—4a, so that by (10.162) we get 


|r) — A (Dil < Yor # . (10.163) 
qceU 


Since the sequence j,(t) is non-decreasing, for g € U we have jg(t) = jy’(t). Also, 
by the first part of (10.153) for g € U, we have jg+4i(t) # jq(¢) so that the numbers 
Jq(t) for q € U are all different and the sum on the right-hand side of (10.163) is 
< zi) r—J < 2r—Wn' proving (10.160). 

Next we prove by induction over n that 92, (t) C In(t) := In(An(t)). This holds 
for n = O since Jo(t) = N*. The argument for the induction from n to n + 1 
depends on the value of p,+1(¢). We start with the easy case is where py+1(t) # 1. 
Then 2n41(t) C 2n(t) C T,(t) and In4i(t) = In(t) by (10.116), concluding 
the argument. Let us now assume that p,+4)(t) = 1. We first note that according 
to (10.112) we then have 


Init) = {i € In(t) s [Wap @i — Wail < 277}. (10.164) 


Also, by construction of 77,+1(A), we have 241 (t) = Wy+1(t), and then (10.160) 
implies that for i € Qp+1(t) we have |wn41(t); — wa(t)i| < 277. Combining 
with the induction hypothesis, (10.164) concludes the proof that 2n41(t) C In41(f) 
and the induction. 

Using (10.160) for n — 1 rather than n, we obtain 


i € Q(t) > |a(Dj — We_-10);| < 201 | (10.165) 
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Since 92,(t) C In(t) C In_-1(@), it follows from (10.69) that 
a [Oj = ta? ALS 277" Ody, 10),wn vba. nO Ha) 


1€Qn(t) 


< 277 Ody, 10), 1lt)akn-1 Orin AnD) 5 (10.166) 


where in the second line we use that jy_j(t) < jn(t) and that d7,» x, j; decreases 
when j increases. 

Finally we are ready to prove the main inequality (10.156). Let us assume first 
that jn—1(t) < jn(t). In that case by (10.153) we have z(t) € An—1(t) so that the 
right-hand side of (10.166) is bounded by 


v8) A(An 4), nt @), tea); kn Ora) 


= 277 An 1 Ay (An OY” S BrP HO MAO | 


where we have used (10.105) for n — | rather than n in the inequality. This concludes 
the proof of (10.156) in that case since jp_1(t) > jn(t) — 2 and r is a universal 
constant. 

To prove (10.156) in general, we proceed by induction over n. The inequality 
holds for n = 0 by our choice of jo. For the induction step, according to the previous 


result, it suffices to consider the case where j,(f) = j,—1(t). Then, according 
to (10.152) we have z,,(t) = 2, -1(t), so the induction step is immediate since 
Qy(t) C Qy_1(t). o 


10.13 Philosophy, VI 


Maybe we should stress how devilishly clever the previous argument is. On the one 
hand, we have the information (10.153) on z,(t), which, together with (10.105), 
allows us to control the right-hand side of (10.166). But we do not care about the 
right-hand side of this inequality; we care about the left-hand side. In order to be able 
to relate them using (10.69), we need to control the difference |, (t); — wn—1(t);| 
for many coordinates i. The coordinates for which we can achieve that depend on ¢. 

Let us try to take the mystery out of the interplay between the sets §2,(t) and 
[,(t). The magic is already in the definition of the sets 2,(t) : when going from 
2y(t) to 2n+41(t), we drop the coordinates i for which z,+;(t); is significantly 
different from z,(t);. Another important feature is that w,(t) = wWn+1(t) unless 
Pn+i(t) = 1, i-e., unless we are in the case (c) of Lemma 10.7.3. We then use a 
marvelously simple device. Each time we have just changed the value of w,(t) (i.e., 
we are at stage n + | with p,+1(t) = 1), we ensure that 7,41(t) = wy+1(t). The 
point of doing this is that for the coordinates 7, we have kept (i.e., those belonging to 
the set [2,41 (f)) the value of 77,+1 (tf); is nearly the same as the value wy, (t);, where 
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n’ is the last time we changed the value of w,(t). This is true whatever our choice 
for 241(¢) and in particular for 241(t) = wn+i(t). Thus we are automatically 
assured that for the coordinates i we keep wy+1(f); is nearly w,(t); (.e., that 
Qn4it) C Ini (0). 

The purpose of (10.120) is precisely to be able to set 7,41(A) = Wn+41(A) when 
Pn+1(A) = 1 while respecting the crucial condition (10.153). 


10.14 A Geometric Characterization of b(T) 


The majorizing measure theorem (Theorem 2.10.1) asserts that for a subset T of 
¢?, the “probabilistic” quantity g(T) = Esup;er > j>1 giti is of the same order 
as the “geometric” quantity y2(T, d). In this section we prove a similar result for 
the probabilistic quantity b(T). The corresponding geometric quantity will use a 
familiar “family of distances” which we recall now. We fix r > 8, and for j € Z 
and s,t € £*, we define (as we have done through Chap. 7) 


gj(s.t) = 074 |s; — HP) AL. (10.167) 


i>1 


Let us then consider the following “geometric measure of size of a subset T of £7”: 


Definition 10.14.1 Given a subset T of £7, we denote by b(T) the infimum of the 
numbers S for which there exists an admissible sequence (A,,) of partitions of T, 
and for A € A, an integer j,(A) with the following properties: 


S,tEeAs> gj, ays, 0) =O". (10.168) 

A(T, dz) <r) , (10.169) 

ce sup 2 ea) : (10.170) 
teT p50 


We recall the quantity b*(T) from Definition 6.2.6. 
Theorem 10.14.2 For a subset T of ¢7 one has 


b(T) < Lr°b*(T). (10.171) 


This theorem is a kind of converse to Theorem 2.10.1. Together with Corol- 
lary 9.4.3, it shows that the measures of size b*(T) and b(T) are equivalent. 
The Latata-Bednorz theorem shows that the measures of size b(T) and b*(T) are 
equivalent. Thus all three measures of size b(T), b*(T), and b(T) are equivalent. 
The equivalence of b(T) and b(T) parallels, for Bernoulli processes, the equivalence 
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of g(T) and 72(7,d) for Gaussian processes. Observe that the proof of the 
equivalence of b(T) and b(T) is somewhat indirect, since we show that both 
quantities are equivalent to b*(T). There laid a considerable difficulty in discovering 
the proof of the Latata-Bednorz theorem: even though b(T) and b(T) are equivalent, 
it does not seem possible to directly construct the partition witnessing that b(T) < 
Lb(T). 

Theorem 10.14.2 is a consequence of the Latata-Bednorz result and of the 
following facts: 


Proposition 10.14.3 Consider a > 0. The set Ba = {t € 0; Lt Iti] < a} 
satisfies b(Ba) < Lra. 


Proposition 10.14.4 For a subset T of £7 one has b(T) < Lry2(T). 
Proposition 10.14.5 Recalling that T + T’ denotes the Minkowski sum of T and 
T', forT, T’ C & one has 

b(T +T’) < Lr(b(T) +. B(T’)) . (10.172) 


The proof of Proposition 10.14.3 is delayed until Sect. 19.2.1 because this result 
has little to do with probability and bears on the geometry of B,. The proof of 
Proposition 10.14.4 should be obvious to the reader having reached this point. If it 
is not, please review the discussion around (9.46), and if this does not suffice, try to 
figure it out by solving the next exercise. 


Exercise 10.14.6 Write the proof of Proposition 10.14.4 in complete detail. 


Proof of Proposition 10.14.5 We first observe that for a translation-invariant dis- 
tance d, we have 


d(is+s',t+?r)<d(st+s’,t+s')+d(t+s’,t'+5') <d(s,t)+d(s’,t'), 
so that since g; is the square of a translation-invariant distance, 
gists’, t+t') < 29,1) +9;(5',0)). (10.173) 


For each t € T + T’ let us pick in an arbitrary manner u(t) € T and u'(t) € T’ with 
t=u(t)+u'(t’). ForA C T, A’ CT" let us define 


AxA'={teT; uitheA,uwi(theA). 
According to the definition of b, there exist admissible sequences (A,) and (A/,) on 
T and T’, respectively, and for A € A, and A’ € A’, corresponding integers j,(A) 
and j’(A’) as in (10.168) and (10.169) with 


Sot eo Son); yer <2b(T'). (10.174) 


n>0 n>0 
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Consider the family of subsets 6, of T +7’ consisting of the sets of the type A x A’ 
for A € A, and A’ € Aj. It obviously forms a partition of T + T’. The sequence 
(B,) of partitions is increasing and card B, < N? < Nn+1. Also, fort = u(t) + 
u(t) € T+ T’, we have t € A,(u(t)) * Aj,(u'(t)), so An(u(t)) * Aj, (u’(t)) is the 
element B,,(t) of B, containing t. For B = A « A’ € By, we set 


bn(B) = min(jn(A), j,(A’)) - (10.175) 
Thus for s, t € B, we have, using (10.173) and also (10.168) in the last inequality, 


Pbq(B)(S,t) = Phy By (u(s) + u'(s), u(t) + u'(t)) (10.176) 
< 29, By(u(s), u(t) + 20h, (By (u'(s), u!(t)) < 2"** 


Let us then define a sequence (C,,) of partitions of T + T’ by setting C, = By_2 
forn > 3 andC, = {T + T’} forn < 2. Obviously, this sequence of partitions is 
admissible. Let us further define k,(B) = by—2(B) for B € Cy = Bn_2 with n > 3 
and k,(T + T’) = min(jo(T), jj(T")) — 1 forn < 2. We will now check that the 
admissible sequence of partitions (C,,) together with the associated numbers k,(C) 
witness that b(T + T’) < Lr(b(T) + b’(T)). First, for any t € T + T’ we have, 
using (10.174) in the third line, 


Sarr kel) Lp miNGIOT)J9TI+L 4 gM p—bn2(Bn-200) 


n>0 n>=3 
< Lyra MMT IT DAL 4. 1 S28 p bu BuO) 
n>1 
< Lr Yh 2h MIM Ia(AnCUlO)) Anal OD) 
n>0 
< Lr(b(T) + b(T’)). (10.177) 


Next, recalling (10.169), and using that r > 2 in the last inequality, 


A(T +T’, do) < A(T, dz) + A(T’, dy) <r 4 pO) 
< 2p~ minGo(T)+j(7)) < --ko(T+T’) , (10.178) 


and we have checked that the sequence (C,,) satisfies (10.169). It remains to 
check (10.168). We use the inequalities gj(s,t) < r?/d(s,t)? to obtain that for 
s,t€7T+T’' andn < 2, we have 


GK, (TT) (St) <P FHT ACT 4. T', dy)? <1 <2", 
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since k,(T + T’) = ko(T + T’) and using (10.178). That is, form < 2 and s,t € 
C € Ch, we have 9 ,(c)(s, t) < 2”. This is also true for n > 3 because for C € Cy, 
we have C € By_2 and k,(C) = by—2(C) and using (10.176). Thus we have also 
checked (10.168) and (10.172) follows from (10.177). oO 


10.15 Lower Bounds from Measures 


At this stage we advise the reader to review Sect.3.3, as the main result of the 
present section, Theorem 10.15.1, is closely connected to Proposition 3.3.1. Given 
aset T C €? and a probability measure 4 on T, we are now going to provide a 
lower bound for b(T) using jz. This will be very useful later, in the study of certain 
random series in Chap. 11. We define jo to be the largest integer j such that 
Vs,teT, gj(s,t) <1. (10.179) 


Thus we can find s,t ¢€ T with @j4i(s,t) > 1, and since @j41(s,t) < 
r2o+) dy (5, t)?, we have r+! A(T, dz) > 1 and 


pa 2 AT als). (10.180) 
Given t € T we define jo(t) = jo, and for n > | we define 
in(t) = sup{jeZ; w(lseT; gj@,s) <2"})=N,'}, (10.181) 
so that the sequence Cin(t))n>0 increases. We then set 


Iu) = D2 2"r-hO | (10.182) 


n>0 


Theorem 10.15.1 For any probability measure pp on T, we have 
/ I, (t)du(t) < Lr3b(7). (10.183) 
T 


According to (10.171) it suffices to prove the following: 


Proposition 10.15.2 Given a probability measure 1 on T, we have 
i: L,(t)du(t) < Lrb(T) . (10.184) 
T 


Proof Consider an admissible sequence (A,) of partitions of T. For A ¢€ 
A, consider an integer j,(A) as in (10.168) and (10.169), and let S = 
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SUPpep Dinso 2°74"), Comparing (10.169) and (10.180) yields r~~! < 
ro) = p—Jo(Ao) < §, Considern > 1 and A € A,. Then fort € A we have 


AC {s eT; jy, (A) OS; t)h< 2"| Cc {s eT; @ jy, (A) OS; t)h< Alas . 


Thus, by the definition (10.181) of Pat), if w(A) => No then in+i(t) > jn(A) 
and thus 


| 2rtl.~ int Ody (t) < 2 | 2h p—InAn Od u(t) . (10.185) 
A A 
On the other hand if (A) < ae then, since jn41(t) > jo(t) = jo, 

[hr beau 27 in 


Summation over all A € A, implies (using in the second term that card A, < Nn 
and N,N—), < No!) 


n+1 = 
[othe auey < 2 | "rAd u(t) +2" 1-H! ; 
T T 


Summation over n > 0 implies 


if > 2"-—nOdu(t) < LS+Lr7 
T 


n>1 
Since 2"r—Jn® < Lr-® < LrS forn = Oorn = 1, we proved that 
ie I,(t)du(t) < LS, which proves the result by definition of b(T). oO 


One piece is missing to our understanding of Bernoulli processes. In the case 
of a metric space (T, d), one knows how to identify simple structures (trees), the 
presence of which provides a lower bound on y2(7,d). One then can dream of 
identifying geometric structures inside a set T C €*, which would provide lower 
bounds for b(T) of the correct order. Maybe this dream is impossible to achieve, 
and not the least remarkable feature of the Latata-Bednorz proof of Theorem 6.2.8 
is that it completely bypasses this problem. The following exercise gives an example 
of such a structure: 


Exercise 10.15.3 Forn > 1 let J, = {0, 1}” and let J = iS ee I,. Foro € {0, ye 
let us denote by o|n € J, the restriction of the sequence to its first n terms. Consider 
a sequence (@,) with a, > 0 and >>,.,;@, = 1. Consider the set T C aae) 
consisting of the elements x = (x;)ie7 such that there exists o € {0, 1" for 
which xj = a, if i € J, andi = o|n and x; = O otherwise. Prove that 
E supper Die eit; = 3/4. 
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The following direction of investigation is related to Proposition 3.4. 1: 


Research Problem 10.15.4 Consider a T C ¢*. Can we find a probability measure 
j2 on T such that 


b(T) < L inf I,(t) ? 
teT 


Key Ideas to Remember 


¢ The main difficulty in proving the Latata-Bednorz theorem is that we know little 
about Bernoulli processes unless we control their supremum norm. Such a control 
is required in particular to use the Sudakov minoration for these processes. 

¢ To control the supremum norm, the main technical tool is chopping maps, which 
replace the process by a related process with a better control of the supremum 
norm. 

e The strategy to prove the Latata-Bednorz theorem is to recursively construct 
increasing sequences of partitions of the index set. The size of each element of 
the partition is controlled by the value of a functional, which however depends 
on the element of the partition itself. 

¢ Compared with the case of Gaussian processes, a fundamentally new partitioning 
principle is required, Latata’s principle. Applying this partitioning principle 
requires changing the chopping map. 

e The difficulty of the construction is to ensure that no essential information is lost 
at any stage. 

* There is a natural geometric characteristic of a set T C £7 equivalent to the 
size of the corresponding Bernoulli process. This geometric property involves the 
existence of an admissible sequences of partitions of the index set with precise 
smallness properties with respect to the canonical family of distances. 

e The existence of such an admissible sequence of partitions in turn controls how 
“scattered” a probability measure on the index set can be. This property will be 
the key to using the Latata-Bednorz theorem. 


10.16 Notes and Comments 


I worked many years on the Bernoulli conjecture. The best I could prove is that 
if p > 1 for any set T C é?, we can write T C T; + T> with yo(T,) < K(p) 
and T, C K(p)B, where B, is the unit ball of €?. This statement has non-trivial 
applications to Banach Space theory: it is sufficient to prove Theorem 19.1.5, but it 
is otherwise not very exciting. Nonetheless many results presented in Part IT were 
results of efforts in this general direction. 


Chapter 11 ® 
Random Series of Functions Ghost for 


As in the case of Chap. 7, the title of the chapter is somewhat misleading: our focus 
is not on the convergence of series, but on quantitative estimates bearing on sums of 
random functions. 


11.1 Road Map 


There are two fundamentally different reasons why a sum of random functions is 
not too large.! 


¢ There is a lot of cancellation between the different terms. 
¢ The sum of the absolute values of the functions is not too large. 


One may of course also have mixtures of the previous two situations. Under 
rather general circumstances, we will prove the very striking fact that there are 
no other possibilities: every situation is a mixture of the previous two situations. 
Furthermore we will give an exact quantitative description of the cancellation, by 
exhibiting a chaining method which witnesses it. 

Let us describe more precisely one of our central results in this direction, which 
is a vast generalization of Theorem 6.8.3.7 (The central result of this chapter, 
Theorem 11.10.3, is an abstract version of Theorem 11.1.1 which is conceptually 
very close.) Consider independent r.v.s (X;)i<y valued in a measurable space 2, 
and denote by A; the distribution of X;. Set v = ; <y Ai. We consider a set T of 


' We have already seen this idea in Sect. 6.8 in the setting of empirical processes. 


> Not only the proof of this generalization is identical to the proof of Theorem 6.8.3, but the 
generalization is powerful, as we will experience in Sect. 11.12. 
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functions on §2, and we denote s, f,... the elements of 7. We denote by dz and doo 
the distances on T corresponding to the L* and L® norm for v. 


Theorem 11.1.1 (The Decomposition Theorem for Empirical Processes) There 
is a decomposition T C T, + Tz such that 


y2(Ti, d2) + vi(Ti, doo) < LE sup ) > eit (Xi) (11.1) 
teT j<n 
and 
E sup ) > |t(X;)| < LEsup ) > eit(Xi) (11.2) 
teTo j<nu te i<N 


To explain why this result fits in the previous conceptual framework, let us lighten 
notation by setting 


S(T) =Esup )° ejt(X;) , 
T 


tet i<N 


and let us observe that when T C T; + To, we have S(T) < S(T) + S(72). When 
T C T, + To we may think of T as a mixture of the sets T; and 7>, and to control 
S(T) it suffices to control S(7,) and S(T). There is a very clear reason why for 
t € T> the sums )°;—,y eit(X;) are not too large: it is because already the sums 
Vien lf (Xi)| are not too large. This is the content of (11.2). 

To understand what happens for 71, we recall the following fundamental bound: 


Lemma 11.1.2 We have S(T) < L(y2(T, do) + y1(T, deo). 


Proof This follows from Bernstein’s inequality (4.44) and Theorem 4.5.13 just as 
in the case of Theorem 4.5.16. Oo 


Thus the information y2(7), dz) + y1(T1, doo) < LS(T) of (11.1) is exactly what 
we need to prove that chaining controls $(7)). 

Since this chaining is obtained through Bernstein’s inequality, and since no 
cancellation is needed to explain the size of S(T), we may picturesquely formulate 
Theorem 11.1.1 as chaining using Bernstein’s inequality captures all the cancella- 
tion. 

Despite the fact that Theorem 11.1.1 is of obvious theoretical importance, one 
must keep realistic expectations. The theorem does not contain a practical recipe to 
find the decomposition. In practical situations, such as those studied in Chap. 14, it 
is the part without cancellations which is difficult to control, and the cancellations 
are easily controlled through chaining (just as expected from Theorem 11.1.1). 

The reader should review Sect. 6.8 now as well as Chap. 7, at least up to Sect. 7.7. 
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11.2 Random Series of Functions: General Setting 


The setting in which we will work is more general than the setting of Theo- 
rem 11.1.1, and we describe it first. We consider an index set T and a random 
sequence (Z;);>1 of functions on T. We do not assume that this sequence is indepen- 
dent.* Consider an independent Bernoulli sequence (¢;);>1, which is independent of 
the sequence (Z;);>1. We are interested in the random sums )* i> €iZi(t). We will 
measure their “size” by the quantity 7 


S:=Esup ))eZi(t). (11.3) 


teT i> 
The crucial technical feature here is that given the randomness of the Z;, we are 
considering a Bernoulli process. The most important case is when the sum in (11.3) 
is a finite sum (i.e., Z; = O for i large enough). In the next chapter we will also 
consider situations where the sum in (11.3) is infinite, and we will then consider 
actual series of functions. To make sure that in this case the series (11.3) converges 
a.s., we assume 


wer, >) EIZ PAD <0, (11.4) 


i>1 


a condition which is automatically satisfied when Z; = 0 fori large enough. 

Our main technical tool is that to each random sequence of functions, we can 
naturally associate a natural family of distances, and we explain this now. We fix a 
number r > 4 anumber j € Z and a given realization of the sequence (Z;);>1. We 
then consider the quantities 


bos, t) = Dri (Zils) - ZO)P AD. (11.5) 


i=l 


In this notation w symbolizes the randomness of the sequence (Z;)j>1. We also 
define (please compare to (7.63)) 


9)(s.1) = EWjo(s.t) = > Er (Zils) — ZA)P-A VD, (11.6) 


i>1 


which is finite for each j, s, t. This is the “family of distances” we will use to control 
the size of T. 


3 Rather, we assume condition (11.8), which, as Lemma 11.2.1 shows, holds when the sequence 
(Z;) is independent, but also in other cases which will be essential in Chap. 12. 
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It follows from Lebesgue’s convergence theorem and (1 1.4) that 


Vs,t eT, lim g;(s,t) =0. (11.7) 
J>7-o * 


We will make the following (essential) additional hypothesis: 
ViEeZ, Ws,tET , Pltj,o(s,t) < j(s,t)/4) < exp(—gj(s,1)/4). (11.8) 


Lemma 11.2.1 The condition (11.8) is satisfied when the r.v.s Z; are independent. 


Proof This follows from Lemma 7.7.2(a), used for W; = |r/(Z;(s) — Z;(t))|? A 1 
and A = 9j(s,t)/4 = (1/4) 5, EWi. o 


In Chap. 12 we will however meet slightly different situations where (11.8) is 
satisfied. Our main result will imply that under the previous conditions, an upper 
bound on S implies the existence of an admissible sequence of partitions of T whose 
size is suitably controlled by the 9;. 


11.3 Organization of the Chapter 


The key result of the chapter, Theorem 11.7.1, states that a control of S = 
Esup,er );>1 €iZi(t) from above implies a kind of smallness of the index space 
T, in the form of the existence of a family of admissible partitions which is suitably 
small with respect to the family of distances (g;). It can be viewed as a lower 
bound for S. It can be seen as a generalization of Theorem 7.5.1 (or more precisely 
of (7.72)) to the case where one no longer has translation invariance.* It can also be 
seen as a generalization of Theorem 5.2.1. The main motivation of the author for 
presenting separately the results of Chap. 5 is actually to prepare the reader for the 
scheme of proof of Theorem 11.7.1, so the reader should review that chapter. 

As in the case of Theorem 5.2.1, and in contrast with the situation of Chap. 7, the 
lower bound of Theorem 11.7.1 is by no means an upper bound. One should however 
refrain from the conclusion that this lower bound is “weak”. In a precise sense, it is 
optimal. In the setting of Theorem 11.1.1, it contains the exact information needed 
to obtain (11.1) so that one could say that it contains the exact information needed 
to perform chaining witnessing whatever cancellation occurs between the terms of 
the random sum )°,., ¢;Z;. What makes the situation subtle is that it may very 
well happen that such cancellation does not really play a role in the size of the 
random sum, that is, the sum }°;., ¢;Z; is not large simply because the larger sum 
>>, |Zi| is not large. In this case the lower bound of Theorem 11.7.1 brings no 


4 So sequences of admissible partitions replace the “entropy numbers” implicitly used in (7.72) as 
explained in Exercise 7.5.4. 
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information, but we have another precious piece of information, namely, that the 
sum )>;., |Z;| is not large. 

The main tool of the proof of Theorem 11.7.1 is the Latata-Bednorz theorem 
(Theorem 6.2.8). There is no question that this proof is rather difficult, so the reader 
may like, after understanding the statement of the theorem, to study as a motivation 
Sect. 11.8, where one learns to gain control of sums such as )°,.. ; |Z;| and Sect. 11.9 
where one proves Theorem 11.1.1. 7 

The next three sections each contain a step of the proof of Theorem 11.7.1. Each 
of these steps corresponds quite closely to a step in the proof of Theorem 5.2.1, and 
this should help the reader to perceive the structure of the proof. In Sect. 11.10 we 
prove a decomposition theorem for random series which extends Theorem 11.1.1, 
and in the final section , we provide a spectacular application. 


11.4 The Main Lemma 


The reader should review the proof of Lemma 5.4.2 and of (5.17). We consider a 
random series of functions as in the previous section. We assume that T is finite and 
we keep (11.7) in mind. We define? 


jo =sup{j eZ; Vs,t €T, pj(s,t) <4} € ZU {oo}. (11.9) 


We denote by M* the set of probability measures on T such that y({t}) > 0 for 
each f in JT. Givena uz € Mt* andt € T, we define 


jo(t) = jo (11.10) 

and for n > 1 we define 
jn(t) = sup { j EZ; w(Bj(t, 2")) = vt EZU {oo}, (11.11) 
where as usual Bj(t,2") = {s © T; yj(s,t) < 2"}. This should be compared 


to (7.70). Thus the sequence (jn(t))n>0 increases, and j,(t) = oo whenever 
L({t}) = Nv! , and in particular for n large enough. We define® 


In(t) = Yi Wr hO | (11.12) 


n>0 


Our goal in this section is to prove the following, where we recall (11.3): 


It may well happen that j (s,t) <4 for all j, for example, if Z; = 0 fori > 5. 
6 The point of assuming j2({t}) > 0 is to ensure that J, (t) < 0. 
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Lemma 11.4.1 For each probability measure 4 € M*, we have 


[oan < KS. (11.13) 
Here and below, K depends on r only. To prove Lemma 11.4.1, we define 
J0.o = sup { j EZ, Vs, teT, Wiaols,t) < 1} E ZU {oo}, 


and we set jo,o(t) = jo,o. Forn > 1 we define 


inte sup | j EZ; w({seT; vjolt,s) <2"}) > N,'| € ZU {oo}. 
Then the sequence (jn,w(t))n>o increases. We define 


LOS eo, (11.14) 


n>0 
Given @ (i.e., given the sequence (Z;);>1) the process X; := V1 ej Z(t) is a 


Bernoulli process, so that using Theorem 10.15.1 at a given w, and denoting by E; 
expectation in the ¢; only, we obtain 


J tuoducty = KE. sup D2 Z.00 


tel js] 


and taking expectation yields 
e/ Ino (t)du(t) < KS. (11.15) 


Lemma 11.4.2 For each t we have J, (t) < LEI, «(t). 


Proof The proof is very similar to the proof of Lemma 7.7.3 (but is easier as we 
prove a weaker result). We will prove that 


P(jo,o(t) < jo(t)) = PCio.w < jo) = 1/L, (11.16) 
n>=3=> PCin-3.0(t) S jn) 2 1/2. (11.17) 
These relations imply respectively that Er—/0.0 > -—J0/L and that 


n> 3 => E2"-3-— jn-3,0(t) > 20 pn 1 . 
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Summing these relations implies }°,,.3 2"p—n) << LE ee, QW p—noO = 


LEI... Since the sequence (in (t)) increases, for n < 2 we have 2’r~Jn << 
4r—)0 < 4Er— Joo < 4E1,,,., and this completes the proof. 

We first prove (11.16). There is nothing to prove if jo = oo. Otherwise, by 
definition of jo, there exist s,t ¢ T with @j.+41(s, t) > 4, and by (11.8) we have 


P(Wrjotto(s.t) > 1)>1-T/e. 
When jo+i,o(s,t) > 1, by definition of jo,4, we have jo,» < jo and we have 
proved that P( jo, < jo) = 1-—1/e. 


We turn to the proof of (11.17). There is nothing to prove if jn(t) = oo. 
Otherwise, by definition of j,,(t) we have 


1 
MSHS ET; OF 4105) <2") rag (11.18) 


n 
For n > 2 it follows from (11.8) that 
97,418 t) = 2” => Pz iro(S.t) <2” 7) < exp(-2"-7) < No, 
so that 
Eu({s 7s 97,0 )4162) 2 2". Vz @sto(5t) $2" 7}) < Nano 
and thus, by Markov’s inequality, with probability > 1/2 we have 
fe SHS ET GeO D Sh 2 Vea git te?) = 2N, 9: 


When this occurs, recalling (11.18) we obtain 


us eT; Vito 9 ea 2"~7}) SMithw2<N, cf 2N,5 = Ny"3 ? 


so that jn—3,0(t) < jn(t). . 


Proof of Lemma 11.4.1 Combine the previous lemma with (11.15). oO 
11.5 Construction of the Majorizing Measure Using 
Convexity 


The reader should review Sect.3.3.2. The goal of this section is to prove the 
following: 
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Theorem 11.5.1 Assume that T is finite. Then there exists a probability measure u 
on T with 


supJ.(t) < KS. (11.19) 
teT 


For historical reasons (which have been briefly explained in Chap. 3), we call 
the measure jz a majorizing measure, and this explains the title of the section. It 
will then be a simple technical task to use this measure for a fruitful application in 
Theorem 11.6.3. 

It is interesting to compare (11.19) with (7.71), which basically asserts that 


1 
P(| 24 |= up Ju) = 0 , 


where ww is the Haar probability on T. The previous inequality implies that 
Ell >0; &:Zill = sup,er J(t)/K, and Theorem 11.5.1 can be seen as a generaliza- 
tion of this fact. To clarify the relationship further, it can be shown in the translation 
invariant setting that for any probability measure v one has sup,e7 Jy(t) < 
Lsup;er Jy(t) where yz is the Haar measure (which we may assume here to be a 
probability). Thus, when one probability measure satisfies (11.19), then the Haar 
measure also satisfies it. 
Our approach to Theorem 11.5.1 is to combine Lemma 11.4.1 with 3.3.2. 


Corollary 11.5.2 Assume that T is finite. Then there exist an integer M, probability 
measures ({1;)i<m and numbers (aj)i<m with a; > 0 and eu a; = | such that 


weT, dad (t)<KS. 
i<M 


Proof Consider the set S of functions of the type f(t) = J,,(t) where yu € Mt? 
Consider a probability measure v on T and wp € M* with w > v/2 (e.g., w= v/2+ 
A/2 where A is uniform over 7). Then f Ju (t)dv(t) < 2 a Jn(tdu(t) < KS, using 
(11.13) in the last inequality. Thus by Lemma 3.3.2 there is a convex combination 
of functions of S which is < KS. oO 


Theorem 11.5.1 is then a consequence of Corollary 11.5.2 and the next result. 


Lemma 11.5.3 Assume that T is finite and consider probability measures ({1i)i<u 
on T and numbers (a;)i<m with a; > 0 and Vick a; = 1. Then the probability 


7 The reason for which we require 4 € M* is to ensure that f(t) < oo for each ¢ so that f is true 
function, as is required by our version of Lemma 3.3.2. 
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measure [L := Yiem Qj [Lj Satisfies 


wel, Uith<sLY  aJy(t). (11.20) 
ix<M 


Proof Let us fix t € T. With obvious notation, for n > | let us define 


Un, = {i < M; poini < 2 jr he} . 
s<M 


Consider the probability measure P on {1,..., M} such that P({i}) = a; and 
the function f on {1,..., M} given by f(i) = r~/i, By Markov’s inequality 
we have P(f > 2f fdP) < 1/2, so that P(f < 2f fdP) > 1/2, ie, 
Vieu, a; > 1/2. For n > 1, let us denote by j, the smallest integer with 


pt LOY ay ari Then jni(t) = jn for i € Un. Thus by definition of 
ini (t) we have j4;(B;,(t,2”)) > N7! and consequently 


1 
(Bj, (t, 2") > Nr! SY° a = > 
. a pz 2Nn Nn+t 


so that by definition Int (t) > jn. Thus (using also that jo. i(t) = jo= jo(t) where 
Jo is given by (11.9)) 


>. 2%) <L oor <L > Qj So 2th =L by ajJy;(t). a 


n>0 n>0 ix<M n>0 i<M 


11.6 From Majorizing Measures to Partitions 


In the setting of one single distance, we had the inequality (3.41) to go from 
majorizing measures to sequences of partitions. We do not know how to generalize 
the proof of (3.41) to the setting of a family of distances, and we give a direct 
argument to pass directly from the existence of a majorizing measure to the 
existence of an appropriate increasing sequence of partitions. The very same 
argument was given in the case of one single distance in Sect. 3.3.3. Let us consider 
again the functions ¢; as in (11.6), so they satisfy 


gj: TXTOR, gj > 9) =9, 96,0 =9;(t,5). 
Since the gj; are squares of distances, they also satisfy the properties 


Vs,t.ueT, pj(s,t) < 26,4) +@;lu,t)), (11.21) 


382 11 Random Series of Functions 
and gj (t,t) = 0. We recall the notation Bj(t,r) = {s € T; gj(t,s) < r}. Asa 
consequence of (11.21) we have the following: 

Lemma 11.6.1 Jf @j(s,t) > 4a > 0 the balls B;(s, a) and B;(t, a) are disjoint. 


We assume that T is finite and we fix a probability measure jz on T.° We assume 
that there is a jo € Z with 


Vs,t€T , vp(s,t) <4. (11.22) 


We assume that fort €¢ T andn > 0, we are given an integer j,(t) € Z with the 
following properties: 


Veer, = is (11.23) 
WteT, Wn >0, jn(t) < jnui(t) < in +1, (11.24) 
weT, Wn >1, u(B,wlt,2") = Nr. (11.25) 


Let us observe that we do require that j,(t) is the largest possible which would 
satisfy (11.25). Rather, we require the technical condition (11.24). To understand 
that important point, we urge the reader to study the following exercise: 


Exercise 11.6.2 Assume that 7 is a group, and yu is the Haar measure, a probability. 
Assume r > 4. Assume that jo satisfies (11.22) and form > 1 consider the numbers 
Jn defined as in (7.70). Prove that for t+ € T there exist numbers j,(t) satisfy- 
ing (11.23) to (11.25) and for which super vps 277") << LY sg 2", 
Hint: jn(t) = supy<,n — p+ jp. 

Theorem 11.6.3. Under the previous conditions there exists an admissible sequence 
of partitions (An)n>o of T and for A € A, an integer j,(A) € Z such that 


stEeAE A, > gj als, t) <2"? (11.26) 
and that the following holds for any value of r > 1: 


Weer, yor ee ak) oe (11.27) 


n>0 n>0 
In particular if S* := supjer Yo )>9 2°77" (11.27) implies 


sap hl 18", 


teT n>0 


8 For reasons explained in Chap. 3, we think of yz as a “majorizing measure”. This explains the title 
of this section. 
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We note that (11.27) does not require a specific relation between the value of r and 
the y;, but this result will be interesting only when the relation (11.25) brings a 
precise control of S*. 

The basic brick of the construction is the following partitioning lemma, in the 
spirit of (2.46): 


Lemma 11.6.4 Consider a set A C T, an integer j € Z, andn > 1. Assume that 
(Bj (t,2")) => 1/Nn for each t € A. Then we can find a partition A of A with 
card A < N,, such that for each B € A, 


s,té€EB>o;(s,) <2", (11.28) 


Proof Consider a subset U of A such that gj(s,t) > 2"+? for each s,t € U, 
s #t. According to Lemma 11.6.1, the balls B;(t, 2”) for t € U are disjoint. These 
balls are of measure > N,- y so their union has measure > Nv! card U. Since the 
measure is a probability, this proves that cardU < N,. If card U is taken as large 
as possible the balls B;(t, 2"*?) centered at the points of U cover A. It follows 
from (11.21) that these balls satisfy (11.28). And A can be partitioned in at most JN, 
pieces, each of which is contained in a ball B;(¢, grt where t € U. oO 


Proof of Theorem 11.6.3 We are going to construct the partitions A, and for A € 
A, integers j,(A). Our construction will satisfy the following property: For n > 2, 
A € Aj, the integer j,(A) is such that 


t€ A= jn_o(t) = jn(A) . (11.29) 


To start the construction, we set Ag = A; = Az = {T}, and forn < 2 and 
A € A, we set jn (A) = jo. According to (11.23), fort € T we have jo(t) = jo so 
that (11.29) holds for n = 2 because then jo(t) = jo = j2(A). 

The rest of the construction proceeds by recursion. Having constructed A, for 
some n > 2 we proceed as follows. According to (11.29) we have jn—2(t) = jn(A) 
fort € A, so that according to (11.24) fort € A we have jn_1(t) € {jn(A), jn(A) + 
1}. We set 


Ag ={t EA; jn-10) = jn(A}; Ar={t EA; jn-1O = n(A) +1}. 
Recalling (11.25), we can then apply Lemma 11.6.4 with n — 1 rather than n and 
J = jn(A) + | to partition the set Aj into N,—1 pieces. According to (11.28), for 


each piece B of A, thus created, we have 


5, fe€BSoGon?™. (11.30) 


9 Since the only element of A, is T, this means that j,(T) = jo forn < 2. 
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For each piece B of A, thus created, we set jn41(B) = jn(A) + 1, so that this piece 
satisfies (11.26) (for n + 1 rather than 1). For a piece B contained in A; we set 
Jn+1(B) = jn(A) + 1. We do not partition Ag, and we set jn41(Ao) = jn(Ao). In 
this manner, we have partitioned A into at most N,-1 + 1 < Ny pieces. We apply 
this procedure to each element A of A, to obtain A,41. Then card. A,+41 < N? = 
Nn+1. Condition (11.29) is satisfied for n + 1 by construction, and so is (11.26) 
(see (11.30)). As for (11.27), it follows from the fact that j,(An(t)) = jo ifn < 2 
and jn(An(t)) = jn—2(t) ifn > 2 (as follows from (11.29)). oO 


11.7. The General Lower Bound 


Still in the setting of Sect. 11.2, we have the following, where T is now countable 
rather than finite: 


Theorem 11.7.1 Assume that T is countable. Then there exists an admissible 
sequence (A,) of partitions of T and for A € Ay an integer jn(A) such that the 
following holds: 


WeT, Sehr ae <KS, (11.31) 


n>0 
Ae An , Ce An-1 ; ACC> Jn-1(C) < Jn(A) < Jn—-1(C) + 1 ; (11.32) 
steAEA, > gj als,t) <2". (11.33) 


Proof Assume first that T is finite. Consider the probability measure ~ on T 
provided by Theorem 11.5.1 and the corresponding numbers j,(t). Let us define 
in(t) = mino<p<n(jp(t) +. — p), so that jo(t) = jo(t) = jo and jn(t) < 
Inti (t) < jn(t) + 1. Since jn(t) < jn(t), by definition (11.11) of jn(t) we have 
1(Bj,u)(t, 2")) = Nj !. Also, rn" < >, rJeO-"+P 0 that, since r > 4, 


Siar hO <2" ype O-atP = 2? PO (=)"" =25,(0): 
r 


n>0 n>0 psn p>0 n>p 


The result then follows from Theorem 11.6.3. 

Assume next that T is countable. We write T as the union of an increasing 
sequence (7;,)x>1 of finite sets. We apply the previous result to each 7;, obtaining an 
admissible sequence (A,,x) of partitions of T,. We number in an arbitrary way the 
sets of Ay % as (An,k,2)e<N,, and fort € T, we denote by £n,;(t) the unique integer 
£ < Ny, such that An.«(t) = An,x,e. For each t € T and eachn > 0, the integer 
fy x(t) is defined for k large enough since then t € T;. We may then assume by 
taking a subsequence that this integer is eventually equal to a number ¢,(t). Also, 
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since the sequence (A,;) is admissible, for each € < Np, there exists £ < Nn-1 
such that An xg C Ag 6 Ee We denote this integer by ick (2). We may also assume 
by taking a subsequence that lnk (€) is eventually equal to an integer 2, (0). For 
£ < N, define Ane = {t € T; €,(t) = €}. Obviously the sets A,,¢ for 2 < Ny 
define a partition A, of T and card A, < N,. Next fort € Ay.¢, for large k we 
have t € Anke C Ant h5,4@) = An Z,(0): This proves that Ay,.¢ C An_1,0,(0)° 
so that the sequence (A,) is admissible. For s,t € T, we have gj7,)(s,t) < 4. 
If the sequence (jo(7;)) is not bounded, then for all s,t € T and each j € Z, 
we have gj(s,t) < 4, and the result is trivial since (11.33) is automatically 
satisfied. Thus we may assume that the sequence (jo(7x))x>1 stays bounded. Since 
Jn(An,k.e) < jo(Tk) +n by (11.32), each sequence (jn(An,x,¢))k>1 Stays bounded. 
We may then assume that for each n and £, this sequence is eventually equal to a 
number j,(An,¢). Itis straightforward to check that these numbers satisfy the desired 
requirements. Oo 


The hypothesis that T is countable is of course largely irrelevant, as the following 
exercise shows: 


Exercise 11.7.2. Extend Theorem 11.7.1 to the case where T is provided with a 
metric such that T is separable!” and each function g;(s, t) is continuous for this 
metric. 


11.8 The Giné-Zinn Inequalities 


Before we proceed we must build our tool kit. In this section we change topic, 
and we investigate a number of simple but fundamental inequalities. The main 
inequality, (11.37), allows to gain a control of sup,e7 )oj<y It (Xi)]. 

We consider a class T of functions on a measurable space 2. The elements of T 
will be denoted s, tf, .... We consider independent r.v.s (X;)i<y valued in 2. 

We denote by (¢;);<x an independent sequence of Bernoulli r.v.s, independent 
of the sequence (X;);<y. We lighten notation by writing 


S(T) = Esup| Y- eit (Xi)| . (11.34) 


teT i<N 


The reader should not be disturbed by the fact that there are absolute values 
in (11.34) but notin (11.3). This is a purely technical matter, and in the key situations 
below, the absolute values do not matter by Lemma 2.2.1. 


10 That is, there is countable subset of T which is dense in T. 
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Lemma 11.8.1 We have 


Esup| °(¢(Xi) — Er(X;))| < 25(7) . (11.35) 


teT i<N 


Proof Consider an independent copy (X é Ji<n Of the sequence (X;)j<y. Then, 
using Jensen’s inequality (i.e., taking expectation in the randomness of the X’ inside 
the supremum and the absolute value on the left and outside these on the right) 


Esup| )-@(X;) — Er(X;))| < E sup | YX) = 1(X)))| . 


teT jen i<N 


Now, the processes (¢(X;) — t(X1))i<n and (¢;(t(X;) — t(X‘)))i<w have the same 
distribution so that 


Esup| })@(Xi) — 1(X})| = Esup| 0 ei(t(X1) — 1(X)))| - 


teT jcy teT jzy 


The conclusion then follows from the triangle inequality. Oo 


Lemma 11.8.2 We have 


E sup | Y- eilt(Xs)| 2 OS(T):. (11.36) 
teT i<N 
Proof Use Corollary 6.5.2 at a given value of the (X;)j<y. oO 


Theorem 11.8.3 (The Giné-Zinn Theorem [35]) We have 


Esup )° |r(Xi)| < sup )) Elt(X;)| + 4S(7) . (11.37) 


teT jzn te i<N 


To better understand this bound, observe that by Jensen’s inequality, we have 
SuPjeT jen Elt(Xi)| < Esup,er )0j<y |t(Xi)|. The Giné-Zinn theorem is a kind 
of converse of this simple inequality. Once we control S(T) (a sum which involves 
cancellations) and sup,er >; <j Elt(X;)|, we control the left-hand side of (11.37), 
a sum which does not involve cancellations (as all the terms are of the same sign). 


Proof We have 


’ 


SY eX < Do Ele + | SS (le KD — Ele) 


i<N i<N i<N 


so that 


Esup )>|¢(X,)| < sup }* Ele(X;)| + Esup | S> (eX) — Ele(X)))). 


teT jen te i<N te i<N 
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The first term to the right is the same as the first term to the right of (11.37). 
Applying Lemma 11.8.1 to the second term to the right (and replacing T by the 
class {|t|; ¢ © T}) and then using (11.36) concludes the proof. oO 


The following is also useful: 
Lemma 11.8.4 /fEr(X;) = 0 for eacht € T and eachi < N then 
S(T) < 2Esup| ) > #(X))| . (11.38) 


teT i<N 


Proof We work conditionally on the sequence (¢;)j<y. Setting J = {i < N;¢ = 
1} and J = {i < N; €; = —1}, we obtain 


E sup | Y- eit (Xi)| < E sup | wie) + Esup| > 1(x;)| : 
eT teT 


ef i<N iel ieJ 


Now, since Er(X;) = 0 for each t € T and eachi < N, denoting E; expectation in 
the r.v.s X; fori € J, we have E; X; = X; ifi € J and E,X; = Oifi ¢ J so that 
Jensen’s inequality implies 


’ 


sup | > 4(Xi)| = sup |Ey Y > 1(X)| < Ey; sup | Y > 1(X) 
teT teT teT 


ie] i<N i<N 


so that taking expectations, 


Esup|}1(X;)| < E sup | Yo 1(X)| . = 
teT Gey teT i<N 


11.9 Proof of the Decomposition Theorem for Empirical 
Processes 


Consider independent r.v.s (X;);<~ valued in a measurable space (2, and denote by 
i; the distribution of X;. Set v = ie 4;. We consider a set T of functions on £2, 
we use the notation 


S(T) = Esup ) > ejt(X;) ; S(T) = Esup| ) > et (X)| , (11.39) 
T teT 


fel i<N i<N 


and we start the preparations for the proof of Theorem 11.1.1. First we observe that 
without loss of generality, we may assume 0 € T. Indeed, if we fix fo € T, the set 
T —to := {t—t; t € T} satisfies S(T — to) = S(T), and if we have a decomposition 
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T—t) C T'+T?, we havea decomposition T C (T° +to)+ T?. So we now assume 
that 0 € T and according to Lemma 2.2.1 we have 


S(T) = 2ST): (11.40) 


For i < WN consider the random function Z; on T given by Z;(t) = t(Xj). 
Define Z;(t) = 0 fori > N, so that the functions (Z;);>; are independent. Then 
S = Esup,er 031 &iZi(t) = S(T). The expressions (11.5) and (11.6) take the 
form 7 


Wiols.t) = Uri 6(X%i) — (XN PAD, (11.41) 
i<N 
6)6.0 = Edo) —1anP AD = [ro-oP A Idv. (11.42) 
i<N 


The main idea is to combine Theorem 11.7.1 and Theorem 9.2.1, but we need 
an extra piece of information. Let us denote by jo = jo(T) the integer provided 
by (11.33) so that g;,(s,t) <4 fors,t eT. 


Lemma 11.9.1 Givent € T we have 
a It|V poj4>--10) dv < LS(T) < LS(T). (11.43) 


Proof Since 0 € T and gj)(0, t) < 4 we have Ad |rt|? A ldv < 4. In particular 
if U = {2|t| > r-/} then v(U) < 16, that is, 0; Ai(U) < 16. Let A = 
{i < N ; A;U) = 1/2}, so that card A < 32. Fori ¢ A we have 1 — A;(U) > 
exp(—2A;(U)), so that jg —A1;(U)) = exp(—32). For i < N consider the 
event 4; given by X; €¢ U and X; ¢U for j Ai and j ¢ A. Then 


P(S))=aU) T[ G-a;@)) =a(U)/L. 
JFLJEA 


Given &;, the r.v. X; is distributed according to the restriction of 4; to U, so that 


1 1 
Fig ars= <a! Itlda; , 
P(5;) Ai(U) Jy 


and hence 


/ |t\da; < LE1g,|t(X;)| < LE1g, (11.44) 
U 


Yet (X)) 


J<N 


where the last inequality follows by Jensen’s inequality, averaging in the r.v.s €; for 
j # i outside the absolute values rather than inside. As the events 4; fori ¢ A are 
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disjoint, and since card A < 32, we have 


ois, < i 1g +) 1s, <14+32=33. 


i<N ig A icA 
Summation of the inequalities (11.44) over i < WN and using (11.40) yields the 
result. oO 
Proposition 11.9.2 We can decompose T C T, + Tz where the set T satisfies 
0€ T, and 
y2(T1, d2) + yi(T1, doo) < LS(T) (11.45) 


and where 
VteETo, Jisiav < LS(T). (11.46) 


Proof We apply Theorem 11.7.1 and then Theorem 9.2.1, calling Tz what is called 
T> + T3 there. Then (11.46) is a consequence of Lemma 11.9.1. oO 


Proof of Theorem 11.1.1 Combining (11.45) and Lemma 11.1.2 yields S(T) < 
LS(T), so that also S(T) < LS(T) since 0 € T;. We may assume that 72 C T—T), 
simply by replacing T) by T. M (T — T;). Thus S(72) < S(T) + S(T) < LS(T). 
Combining with (11.46), Theorem 11.8.3 then implies Esup,¢7, }°;<y |t(Xi)| 
LS(T) and finishes the proof. oo oO 


IA 


Proof of Theorem 6.8.3 We apply Theorem 11.1.1 to the case where A; = d is 
independent of i so that » = NA. Then, with obvious notation, y2(7T, d2,..) = 
VNy2(T\ , d2,,). We also use Lemma 11.8.4. oO 


11.10 The Decomposition Theorem for Random Series 


We will now apply the previous result to random series, and we go back to that 
setting, as in Sect. 11.2. We assume that for some integer N, we have Z; = O for 
i > N and that (Z;);<y are independent, but not necessarily identically distributed. 
We consider on T the following two distances: 


da(s,t)” = YU EIZi(s) — Zi)? (11.47) 
i<N 


doo(s,t) = inf{a; Vi< N; |Zi(s)— Zi()| <aae.} , (11.48) 
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and we assume that they are both finite. The following provides an upper bound for 
S = Esup,er lien eZ; (t): 
Theorem 11.10.1 We have 


S < L((T, do) + vi(T, doo) - (11.49) 


Proof As always, this follows from Bernstein’s inequality (4.44) and Theo- 
rem 4.5.13. oO 


In another direction, the following trivial bound does not involve cancellation: 


Proposition 11.10.2 We have 


S <Esup )> (Zi). (11.50) 


teT sen 


As our next result shows, these two methods are the only possible methods to 
bound S. In other words, every situation is a mixture of these. In loose words: 
chaining using Bernstein’s inequality explains all the part of the boundedness which 
is due to cancellation. 


Theorem 11.10.3 For any independent sequence (Z;)i<n of random functions, we 
may find a decomposition Z; = zt + Zz? such that each of the sequences (Z))i<n 
and (Z?)ien are independent, and the following hold: First, 


y(T, d3) + yi(T, do) < LS, (11.51) 


where the distances a, and ae are given by (11.47) and (11.48) where Z; is 
replaced by Z Re Second, 


Esup )°|Z7()| < LS. (11.52) 


teT j<n 


Certainly it would be of interest to consider more precise situations, such as 
the case where T is a metric space and where Z; is a continuous function on T. 
In that case however, it is not claimed that the previous decomposition consists of 
continuous functions. The possibility of this is better left for further research. 


Proof Theorem 11.10.3 is a simple consequence of Theorem 11.1.1. The one 
difficulty lies in the high level of abstraction required. The independent sequence 
of random functions (Z;);<y is just an independent sequence of random variables 
valued in the space 22 = IR’. We denote by A; the law of Z; on 2 = R?, and we 
setv = Dey Ai. 

To each element f € T we associate the corresponding coordinate function 6(t) 
on 2 = R’. That is, for x = (x(s))ser € 2 we have O(t)(x) = x(t). Thus we 
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have 6(t)(Z;) = Z;(t).!! It should be obvious that 


S = Esup ) | eiZi(t) = Esup ) | €0(1)(Zi) = S@(T)) , 


teT jen teT jen 


where 0(T) = {0(t); t € T} is a set of functions on §2. We use the decomposition 
of 6(T) provided by Theorem 11.1.1. For eacht € T we fix a decomposition 6(t) = 
6(t)! + O(t)* where 6(t)! € O(T); and A(t)? € O(T)>. 

We define then the random functions Zi and yi on T by 


Zit) = O(t)' (Zi); Z7@) = 00" (Z), (11.53) 


so that Zj(t) = Z/(t) + Z?(t). The definition of Z?(t) should make it obvious 
that (11.52) follows from (11.2). Next, 


dy(s,t)? =} E|Z;(s) — Z}OP? = >> Ele)" (Zp) — 0)" (Zi)? 


i<N i<N 


= |a(s)' — a(t) |?dv , 
(Gf 


so that, with obvious notation, d}(s,t) = \0(s)! — 6(¢)!I2,y. That is, the map t 
4(t)! witnesses that the metric space (T, d}) is isometric to a subspace of the metric 
space (0(T), d2,,) and thus y2(T, d3) < 72(6(T)1, doy) < LS, using (11.45) in 
the last inequality. The rest is similar. Oo 


It is probably worth insisting on the highly non-trivial definition (11.53). For 
j = 1,2 we may define amap &; : 2 > Q= R? by Fj (x)(t) = 6(t)/ (x). These 
are fairly complicated maps. The formula (11.53) reads as zZ = &;(Z;). The next 
example stresses this point. 


Example 11.10.4 We should stress a subtle point: we apply Theorem 11.1.1 to 
6(T), a set of functions on C. When T itself is naturally a set of functions on 
some other space, say on [0, 1], it is not the same to decompose T as a set of 
functions on [0, 1], or 6(7) as a set of functions on C. To explain this, consider 
a set T of functions on [0, 1] and independent r.v.s (&, X;)i<y valued in R x [0, 1]. 
To study the quantity sup,er )\;<y ei€it (Xi), we have to consider the functions 
6(t) on R x [0, 1], given for (x, y) € Rx [0,1] by O(t)(x, y) = xt(y). It is this 
function which is decomposed. In particular, there is no reason why one should have 
6(t)!(x, y) of the type xt’ (y) for a certain function ¢’ on [0, 1]. 


'l The best way to write the proof is to lighten notation by writing ¢(Z;) rather than 0(t)(Z;) and to 
think of T as a set of functions on (2. Please attempt this exercise after you understand the present 
argument. 
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11.11 Selector Processes and Why They Matter 


Given a number 0 < 6 < 1, we consider iid. rv.s (6;)i<m with 
P(6; = 1)=6; P(6; =0)=1-—4. (11.54) 


We will assume that 5 < 1/2, the most interesting case.!? The r.v.s 6; are often 
called selectors, because they allow to select a random subset J of {1,..., M@} of 
cardinality about Ecard J = 6M, namely, the set {i < M; 6; = 1}. They will be 
used for this purpose in Sects. 19.3.1 and 19.3.2. 

Selector processes are also important as that they provide a discrete approxi- 
mation of the fundamental procedure of constructing independent random points 
(Xj) j<wn distributed according to yz in a probability space (2, w). We will explain 
this is a very informal manner. Assuming for clarity that jz has no atoms, let us 
divide §2 into M small pieces (2; of equal measure, where M is much larger than 
N. Consider then selectors (4;);< where 6 in (11.54) is given by 6 = N/M. When 
6; = | let us choose a point Y; in §2;. Since §2; is small, how we do this is not very 
important, but let us be perfectionist and choose Y; according to the conditional 
probability that Y; € (2;. Then the collection of points {Y;; 6; = 1} resembles 
a collection {X;; j < N’} where the points (X j)j<m’ are independent, distributed 
according to yz, and where N’ = )°;_ 1, 6;. For M large, N’ is nearly a Poisson 
r.v. of expectation 5M = N. Soa more precise statement is that selector processes 
approximate the operation of choosing a set of N’ independent random points in 
a probability space, where N’ is a Poisson r.v.!> Many problems are “equivalent”, 
where one considers this random number of points (X ;) ;<n’, or a fixed number of 
points (X;) ;<y (the so-called Poissonization procedure). 

We will call a family of r.v.s of the type 7; —y ti (5; — 5) where t varies over a 
set T of sequences a “selector process”, and we set 


5(T) := Esup | So 666i — 6)|. (11.55) 


teT i<M 


According to the previous discussion, we expect for selector processes a result 
similar to Theorem 6.8.3. This will be proved in the next section as a consequence 
of Theorem 11.1.1. 


!2 The case 5 > 1/2 actually follows by the transformation 6; — 1 — 6; andé > 1 —6. 


'3 Tf you want to emulate an independent sequence (X ji<nr by this procedure, you first consider 
the collection {Y;; 5; = 1} and you number those Y; in a random order. 
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Since E(6; —5)* = 5(1—8) < 1/4, in the present case Bernstein’s inequality (4.44) 
yields 


2 
P(| LG —8)| = v) < 2exp(—min Goes A a) 
(11.56) 


Combining with Theorem 4.5.13, (11.56) implies a first bound on selector processes. 
If T is a set of sequences with 0 € T, then (recalling the quantity 6(7) of (11.55)), 


5(T) < L(V8y2(T, d2) + VA(T, doo) - (11.57) 


The following shows that the chaining argument of (11.57) takes care of all “the 
part of boundedness which comes from cancellation”: 


Theorem 11.12.1 Given a set T of sequences we can write T C T; + Tz with 


L8(T) 
y2(T), dz) < > Wi(T1, doo) < LS(T) (11.58) 
V5 
and 
E sup > |ti|8: < L8(T). (11.59) 
tel icy 


Proof We will show that this is a special case of Theorem 11.1.1. Consider the 
space 2 = {0,1,..., M}, and for an element ¢ = (t;);<y € T, consider the real- 
valued function # on Q given by #(0) = 0 and f(i) = #; for 1 < i < M. Conversely, 
for a real-valued function u on 92, denote by P(u) the sequence (u(i))1<j<y, and 
note that P(t) =t = (t;)i<m- For 1 <i < M consider the r.v. X; valued in 2 given 
by X; = 0 if 6; = O and X; =i if 6; = 1. Then f(X;) = 4; and if T = {7;t € T} 
then 


S(T) := Esup )> ejf(X;) = Esup )> edit; . (11.60) 
ieT i<M tel icy 


We will prove later that 


Esup| )0 eidit;| < 45(7) , (11.61) 
teT i<M 
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so that in particular 
S(T) < LO(T) . (11.62) 
The law A; of X; is such that A; ({i}) = P(X; = i) = 6 and A; ({0}) = P(X; = 0) = 


1 — 6. The measure v = 97, Ai is then such that for any function wu on 2, we 
have 7 


sIPUny? = 8 Swi? s f utdv = ll? 


i<M 
where the norms are, respectively, in £*(M) and in L?(v). Consequently, 
V5|P(u)|l2 < llull2 + PI loo ¥ Mullloo - (11.63) 


Consider the decomposition T CU, + U2 provided by Theorem 11.1.1, so that, 
using (11.62) in the last inequality, 


y2(U1, dz) + ¥1(U1, doo) < LS(T) < L8(T) (11.64) 
and 
E sup )- |u(X;)| < LS(T) < La(T). (11.65) 
ucla jy 


Since T = P(T), this provides a decomposition T C T + T) where T; = P(Uj). 
It follows from (11.63) that 


1 
y2(T1, d2) < Aig cas > VIN), doo) = vi(U1, doo) , 


and then (11.64) implies that (11.58) holds. Furthermore since |u(Xj)| > 
6;|u(z)| (11.65) implies (11.59). 

It remains only to prove (11.61). For this let us denote by (6/);<y a copy of the 
sequence (6;)j<m, which is independent of all the other r.v.s previously used. We 
first note that by the triangle inequality we have 


Esup| > ° (6; — 8))t\| < 28(T) . (11.66) 


te! i<M 
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Next, the sequences (¢;|5; — 6;|)i<m, (€i(5; — 6:))i<m and (5; — 6')i<m have the 
same distribution, so that 


E sup | Se eilsi — 8)|t;| = E sup | > £1 (5; — 8;)t;| = E sup | IC — 8))t;| ; 


teT jcm teT jcm teT icy 
(11.67) 
Now, using Jensen’s inequality, and since E|6; — 6, = 26(1 — 4) > 6, 
5E sup | > ejti| < Esup| > ei|5; — 8; |t;| < 28(T) , (11.68) 
i<M T j<mM 
where we use (11.67) and (11.66) in the last inequality. Next, we write 
Esup| )) eiditi| < Esup| )> ei(5; — 5)ti|+ dEsup| > ei]. (11.69) 
teT i<M teT i<M teT i<M 
Using Jensen’s inequality and (11.66) to obtain 
Esup| )> e:(8; — 8)t)| < Esup | )> e1(6; — 8))t;| < 28(T) , 
tel i<mM €f i<M 
and combining (11.68) and (11.69) we obtain (11.61). oO 


Exercise 11.12.2 Consider a set T of sequences and assume that for a certain 
number A > 0, 


A 
y2(T, dz) < 73 > WT, do) SA. 


(a) Prove that then convT C TJ; + T> where E sup; er, Diem |t;|6; < LA and 


y2(T1, dz) < LA/VW8; yi(Ti,doo) < LA. 
(b) Prove that it is not always true that 


LA 
y2(conv T, dz) < —=; yi(convT, do) < LA. 


V3 


Hint: Use Exercise 8.3.7 and choose 6 appropriately small. 


Theorem 11.12.1 shows that chaining, when performed using Bernstein’s 
inequality, already captures all the possible cancellation. This is remarkable because 
Bernstein’s inequality is not always sharp. We can see that by comparing it with the 
following simple lemma: 
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Lemma 11.12.3 Consider a fixed set I. If u > 66 card I we have 
u Uu 
P( 04 =) <exp(- Slog) . 11.70 
(L lt abe ak oT ee 


Proof We are dealing here with the tails of the binomial law and (11.70) follows 
from the Chernov bounds. For a direct proof, considering 4 > O we write 


Eexpad; < 1+ de* < exp(de*) 
so that we have 


Eexpa > 5; < exp(Se” card I) 


ie] 
and 


P(>- 6; > u) < exp(de* card] — ru) . 
iel 
Se* card I = u/2 <du/2. 7 o 


Exercise 11.12.4 Prove that the use of (11.56) to bound the left-hand side 
of (11.70) is sharp (in the sense that the logarithm of the bound it provides is 
of the correct order) only for u of order 6 card J. 


We then take A = log(u/(2édcard/)), so that A > 1 since u > 6dcard/ and 


Key Ideas to Remember 


¢ A natural way to bound the discrepancy E SUP feF | Wien €: f (Xi)| of a class of 
functions is to use chaining and Bernstein’s inequality. 

e An alternate way to control this discrepancy is to give up on possible cancella- 
tions and to use the inequality 


E sup | ) > ei f(Xi)| < E sup )> F(X) - 


SEF i<n JES i<N 


e« Amazingly, there is never a better method than interpolating between the 
previous two methods: all possible cancellation can be witnessed by Bernstein’s 
inequality. 

¢ The previous result can be interpreted in terms of certain random series of func- 
tions. The uniform convergence of these can either be proved from Bernstein’s 
inequality, or without using cancellation, or by a mixture of these two methods. 
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Even though Theorem 11.6.3 uses only ideas that are also used elsewhere in the 
book, I formulated it only after reading the paper [18] by Witold Bednorz and Rafat 
Martynek, where the authors prove in a more complicated way that chaining can 
be performed from a majorizing measure. In fact, the possibility of performing 
chaining from such a measure goes back at least to [112]. The contents of Sect. 11.5 
and in particular Lemma 11.5.3 are taken from [18]. The use of convexity to 
construct majorizing measures goes back to Fernique (see Sect. 3.3.2) but is used 
for the first time in [18] in the context of families of distances. 


Chapter 12 ® 
Infinitely Divisible Processes od 


The secret of the present chapter can easily be revealed. Infinite divisible processes 
can be seen as sums of random series }°;. , ¢;Z; of functions where the sequence 
(€;)i>1 iS an independent sequence of Bernoulli r.v.s, which is independent of 
the sequence (Z;), and where the sequence of functions (Z;);+; shares enough 
properties with an independent sequence to make all results of the previous chapter 
go through. Moreover, when Z; is a multiple of a character, the process behaves just 
as arandom Fourier series, and the results on these extend to this case. 

The main result of this chapter is a decomposition theorem for infinitely divisible 
processes, Theorem 12.3.5 in the spirit of Theorem 11.1.1. It is a consequence of 
Theorem 11.10.3, our main result on random series of functions. 


12.1 Poisson r.v.s and Poisson Point Processes 


We start by recalling some classical facts. A reader needing more details may refer 
to her favorite textbook. 
A Poisson r.v. X of expectation a is a r.v. such that 


n 


Wn > 0; P(X =n) =< exp(-a), (12.1) 
n! 
and indeed EX = a. Then, for any b € C, 


Eb* = exp(—a) yore = exp(a(b— 1)), (12.2) 


n>0 
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and in particular 
E exp(AX) = exp (a(expa — 1)) . (12.3) 


Consequently, the sum of two independent Poisson r.v.s is Poisson. 

Consider a o-finite measure v on a measurable space (2. A Poisson point 
process of intensity v is a random subset JT (at most countable) with the following 
properties: For any measurable set, A of finite measure, 


card(A IT) is a Poisson r.v. of expectation v(A) , (12.4) 
and moreover 


If Ay,..., Ag are disjoint measurable sets, the r.v.s 


(card(Ag M I7))e<x are independent . (12.5) 


A very important result (which we do not prove) is as follows: 


Lemma 12.1.1 Consider a Poisson point process of intensity v and a set A with 
0 < v(A) < o@. Given card A) = N, the set IT A has the same distribution 
as a set {X1,..., Xn}, where the variables X; are independent and distributed 
according to the probability X on A given by \(B) = v(AN B)/v(A) for B C A. 


The purpose of the next exercise is to provide a proof of the previous result and 
give you a chance to really understand it.! 


Exercise 12.1.2 Assuming that v({2) < oo consider a subset IT of (2 generated 
by the following procedure. First, consider a Poisson r.v. M with EM = v(2). 
Second, given M, consider i.i.d. points Yj,..., Yy distributed according to the 
probability P(A) on 2 given by P(A) = v(A)/v(@), and set TT = {Y,,..., Yyy}. 
Prove that (12.4) holds for a subset A of §2 and that the property of Lemma 12.1.1 
holds too. When v(S2) is not finite but (2 is o-finite, show how to actually construct 
a Poisson point process on it. 


We will enumerate all the points of the set [7 as (Z;)i>1 2 We observe first that 
as a consequence of (12.4) for any set A 


Y > P(Z; € A) = Ecard(IT NA) = v(A). (12.6) 


i=l 


' Tf this does not suffices, you may look into [46] Proposition 3.8. 


? Here we implicitly assume that there are infinitely many such points, i.e., v has infinite mass, 
which is the case of interest. One should also keep in mind that there are many possible way 
to enumerate the points, and one should be careful to write only formulas with make sense 
independently of the way these points are enumerated. 
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Consequently, if f is an integrable function on 22 we have 


E> fZ@= ; f (B)dv(B) , (12.7) 


i=l 


as is seen by approximation by functions taking only finitely many values. If A is a 
measurable set of finite measure, f = cl, and X = card AN// then ei f(Zi) = 
cX so that 


Eexpa f (Zi) = Eexp(acX) = exp(v(A)(expdAc — 1)). 


i=l 


where we use that X is a Poisson r.v. of expectation v(A) and (12.3) in the second 
equality. When f is a step function, f = )\y<, cela, := eck fe for disjoint sets 
Ag, the previous formula combined with (12.5) implies 


Eexpa ) (Zi) = [| [ Eexps )- felZi) = exp () v(Aev(expace — 1) 


i>] l<k i>] e<k 
= exp( f expafisy— Dap), 128) 


a formula which also holds by approximation under the condition that the exponent 
in the right-hand side is well defined. This is in particular the case if f is bounded 
above and if 7 | f| A 1dv < oo, where we recall the notation a A b = min(a, D). 
This formula will let us obtain bounds on the quantities }°;. , f(Z;) pretty much as 
if the Z; were independent r.v.s. It contains almost all that we need to know about 
Poisson point processes. Let us state right away the basic lemma. 


Lemma 12.1.3 Consider 0 < f < 1. Then: 
(a) If4A < f fdv we have 


P( 2 f(Zi) = A) < exp(—A). (12.9) 


i=l 


(b) IfA>4f fdv we have 


P( >> f(Zi) = A) < exp(- *) ae) 


i=l 


Proof Using (12.8) the proof is nearly identical to that of Lemma 7.7.2. oO 
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Infinitely divisible rv.s are standard fare of probability theory. Intuitively they 
are r.v.s which are sums of infinitely small independent r.v.s. We will call an 
infinitely divisible random process a family (X;) of r.v.s such that every finite linear 
combination of them is infinitely divisible. 

Let us not be misled by our terminology. What is called in the literature an 
infinitely divisible process is usually a very much more special structure.*> The 
“infinitely divisible processes” of the literature are to the processes we study what 
Brownian motion is to general Gaussian processes. 

The beautiful classical body of knowledge about infinitely divisible r.v.s (such as 
the so-called Lévy-Kintchin representation of their characteristic functional) bears 
little on our study because what matters here is a certain representation of infinitely 
divisible processes as sums of random series which are conditionally Bernoulli 
processes. For this reason we will directly define infinitely divisible processes as 
sums of certain random series, and we postpone to Appendix C the task of relating 
this definition to the classical one. 

Consider an index set T and the measurable space C = C’, provided with the 
o-algebra generated by the coordinate functions.* We consider a o-finite measure v 
on C, and we make the fundamental hypothesis that 


WweT: [eer a tarp) <0, (12.11) 
Cc 


A Poisson process of intensity measure v generates a sequence (Z;);>1 of points 
of C, that is, a sequence of functions on 7. Under (12.11), given t € T it 
follows form the formula (12.7) (applied to the function B b |B (t)|? A 1) that 
ED IZiO?Al = fo |BO2A1dv(B) < 00, so that 7; |ZiQ2A1 < coas., 
and hence also )°;., |Z;(t)|?_ < oo. Consider an independent Bernoulli sequence 
(€i)i>1, independent of the process (Z;). Then the series X; = 1 éjZj(t) 
converges a.s. = 


Definition 12.2.1 An infinitely divisible (symmetric and without Gaussian compo- 
nent) process is a collection (X;);e7 as above where 


Rea) aw. (12.12) 


i=l 


The measure v is called the Lévy measure of the process. 


3 That is, a process on R with stationary increments 


4 As usual, we will not care about measure-theoretic details because when considering processes 
(X;)rer are only interested in the joint distribution of finite collections of the r.v.s X;. 
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Thus, given the randomness of the (Z;), an infinitely divisible process is 
a Bernoulli process. This will be used in a fundamental way. Typically, the 
convergence of the series )°;. ; €;Z;(t) is permitted by cancellation between the 
terms. 7 

There is no reason why X; should have an expectation. (It can be shown that this 
is the case if and only if Je |B(t)|dv(B) < oo.) When studying infinitely divisible 
processes, medians are a better measure of their size than expectation. To keep the 
statements simple, we have however decided to stick to expectations. 

It is an important fact that p-stable processes (in the sense of Sect.5.1) are 
infinitely divisible processes in the sense of Definition 12.2.1. This is explained 
in the next section. 


12.3. Overview of Results 


Throughout this section (X;);e7 denotes an infinitely divisible process, as in 
Definition 12.2.1, of which we keep the notation. Following our general philosophy, 
our goal is to relate the size of the r.v. sup,;-7 X; with a proper measure of “the size 
of T”. Given a number r > 4, we will use the “family of distances” on T x T given 
by 


gj(s.t) = | Ir/(B(s) — BA)? A 1dv(B) (12.13) 


(where j € Z) to measure the size of 7, where of course v is the Lévy measure of 
the process. 


12.3.1 The Main Lower Bound 


Let us stress that for the time being, we consider only real-valued processes. 

In words our main lower bound shows that the boundedness of an infinitely 
divisible process implies a certain smallness condition of the index set T.The level 
of abstraction reached here makes it difficult to understand the power of this result, 
which will become apparent in the next section, which relies on it. 
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Theorem 12.3.1 Consider an infinitely divisible process (Xt)ter, as in Defini- 
tion 12.2.1, and assume that T is countable. Then there exists an admissible 
sequence (A,) of partitions of T and for A € Ay an integer jn(A) such that the 
following holds: 


Veet, yor PG) < KE sup Xj (12.14) 
n>0 teT 
(ese Aa pwns?” (12.15) 


where yj (s, t) is given by 12.13. 


Proof This is a special instance of Theorem 11.7.1. The condition (11.8) is a 
consequence of Lemma 12.1.3. Oo 


Exercise 12.3.2 Analyze the proof to convince yourself that in the right- 
hand side of (12.14), one could write instead KM where M is a median of 


Ee supyer ois €1Zi(t)- 


We explain now why Theorem 12.3.1 can be seen as a considerable extension of 
Theorem 5.2.1. As is explained in Sect. C.4, a p-stable process is infinitely divisible, 
and moreover its Lévy measure v is obtained by the following construction. 
Denoting by A Lebesgue’s measure on R*, there exists a finite positive measure 
monC = C?! such that v is the image of the measure 4 @ 4 on Rt x C under the 
map (x, B) & xl/P Bg. By change of variable, it is obvious that 


/ (lax '/P/? A 1)dx = C1(p)lal?. (12.16) 
Rt 
Consequently 
gj (s,t) = / i (Ix7!/PrJ (@(s) — a(t)? A 1)dm(w)dx 
CJRt 
= cuipyr'? f |w(s) — w(t)|?dm(@) = Ci(p)ri?d(s, ty? , 
Cc 


where d(s, t)? = Je |w(s) — w(t)|?dm(w). It is possible to show that the distance 
d associated with the p-stable process (X;) as in (5.4) is a multiple of d (see 
Appendix C). Then (12.15) implies A(A, d) < K2”/Pr—/A), and (12.14) yields 


y 5 2"/4 A(An(t), d) < KEsup X;, 
teT 


n>0 


where 1/q = 1 — 1/p and thus y,(T,d) < KEsup;,¢r X;, which is the content of 
Theorem 5.2.1. 


12.3 Overview of Results 405 


12.3.2. The Decomposition Theorem for Infinitely Divisible 
Processes 


Let us now turn to the difficult problem of bounding infinitely divisible processes. 
Let us first show how to bound infinitely divisible processes using chaining. On T 
we consider the distances 


B(s,1) = 1B) ~ Beran) (12.17) 


doo(s,t) = inf {a > 0; |B(s)— B)| < av-ae.} . (12.18) 


The distance d.. is simply the distance induced by the norm of L°(v) when one 
considers a point t of T as the functions on C given by the map 6 +> f(t), and 
similarly the distance d is the distance induced by the norm of Ew). We will 
prove a suitable version of Bernstein’s inequality which will make the next result 
appear as a chaining bound (4.56). 


Theorem 12.3.3 We have 


Esup X; < L(y2(T, do) + vi (T, doo)) - (12.19) 


teT 7 


There is however a method very different from chaining to control the size of 
an infinitely divisible process, a method which owes nothing to cancellation, using 
the inequality |X;| = | >0;., 6Zi@)| < 32;., |Zi()|. This motivates the following 
definition: 7 7 


Definition 12.3.4 Consider a set T, a o-finite measure v on C = C! and assume 
that , |B(t)| A 1dv(B) < oo for each t € T. Then we define the process (|X|+)rer 
by 


Xl = Do IZM. (12.20) 


i=l 


When we control the supremum of the process (|X|+)se7, we may say that the 
boundedness of the process (X;)re7 Owes nothing to cancellation. 

We have described two very different reasons why an infinitely divisible process 
(X1)re7 may be bounded. 


e The boundedness may be witnessed by chaining as in (12.19). 
e It may happen that the process (|X|+)re7 is already bounded, and then the 
boundedness of (X;) owes nothing to cancellation. 


The main result of this chapter, the decomposition theorem for infinitely divisible 
processes below states that there is no other possible reason: every bounded 
infinitely divisible process is a mixture of the previous two situations. 
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Theorem 12.3.5 (The Decomposition Theorem for Infinitely Divisible Pro- 
cesses) Consider an infinitely divisible process (X,)rer, as in Definition 12.2.1, 
and assume that T is countable. Let S = Esup,er Xt. Then we can write X; = 
x} + x in a manner that each of the processes (X) ter and (X?\rer is infinitely 
divisible and that 


y2(T, dx) + i(T, dy) < KS, (12.21) 


where the distances d} and a are given by (12.18) and (12.17) (for the process 
(xX +) rather than (X;)) whereas 


Esup|X7|,< KS. (12.22) 


teT 


This decomposition witnesses the size of Esup,.7 X;. Indeed Esup,.7 X; < 
Esup,e7 X i +Esup,er X mh The first term on the right is bounded through chaining; 
see (12.19). The second term is bounded because E sup,¢7, |X:| < Esuprer, |X|: 
is already bounded by (12.22). 

The decomposition theorem is a close cousin of Theorem 11.10.3. In words it 


can be formulated as follows: 


Chaining using Bernstein’s inequality captures exactly the part of the 
boundedness of an infinitely divisible process that is due to cancellation. 


Exercise 12.3.6 Learn about the Lévy measure of a p-stable process in Sect. C.4 
(which was described at the end of Sect. 12.3.1). Show that if such a process is 
not zero, the process (|X|+);er is not defined. Conclude that when applying the 
decomposition theorem to a p-stable process (X;);e7, it is not possible to take the 
pieces (xX rer and (Pier both p-stable. 


12.3.3 Upper Bounds Through Bracketing 


Our result is called a “bracketing theorem” because for each A € A,,, we control the 
size of the “brackets” h4(@) = [infye w(t), Sup;e4 @(t)] = sup, pe, |@(s)—@(t)|. 


Theorem 12.3.7 Consider an admissible sequence (A,) of T, and for A € Ay and 
wo € C =R’ consider ha(w) = SUP, te |@(S) — w(t)|. Assume that for A € An 
we are given j,(A) € Z satisfying 


Ae An, CE An-1, ACCS jn(A) = jn-1(C) « 
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Assume that for some numbers r > 2 and S > 0 we have 


VAE An, i (On A Ajdv = 2”, (12.23) 
J tr tenpss-wr a <S, (12.24) 
and 
WeeT, pe aa ae (12.25) 
n>0 


Then E sup;er |X| < KS. 


The principle of Theorem 12.3.7 goes back at least to [112], but its power does 
not seem to have been understood. 


12.3.4 Harmonizable Infinitely Divisible Processes 


Motivated by Chap. 7, we may expect that when “there is stationarity” (in the sense 
that there is a kind of translation invariant structure), it will be much easier to find 
upper bounds for infinitely divisible processes. In this section, in contrast with the 
rest of the chapter, infinitely divisible processes are permitted to be complex-valued. 

Consider (for simplicity) a metrizable compact group 7, and its dual G, the set 
of continuous characters on T. We denote by CG the set of functions on T which 
are of the type ax where a € C and x is a character. 


Definition 12.3.8 If T is a metrizable compact group, an infinitely divisible process 
(X;)ter as in Definition 12.2.1 is called harmonizable if its Lévy measure is 
supported by CG. 


Special classes of such processes were extensively studied by M. Marcus and G. 
Pisier [62] and later again by M. Marcus [59]. Although it would be hard to argue 
that these processes are intrinsically important, our results exemplify the amount 
of progress permitted by the idea of families of distances.> To bring forward that 
the study of these processes is closely related to that of random Fourier series, we 
state four results which parallel Lemmas 7.10.4 to 7.10.7 and provide a complete 
understanding of when these processes are bounded a.s.° Here jz denotes the Haar 


> For example, Marcus [59] obtains necessary and sufficient conditions for boundedness only in 
the case of harmonizable p-stable processes considered in Sect. 12.3.5. 

© Since the proofs of these results are, in a high level sense, mere translations of the proofs of the 
results for the random Fourier series, we will not provide them. 
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measure of G and (X;);e7 an infinitely divisible harmonizable process. The first 
lemma is obvious. 


Lemma 12.3.9 [f the process (X1)ter is a.s. uniformly bounded, given a > O there 
exists M such that P(sup,e7 |X:| > M) <a. 


Theorem 12.3.10 There exists a number a; > O with the following property. 
Assume that for a certain number M we have 


P( sup |X| > M) = ay . (12.26) 
teT 


Then there exists an integer jo such that 
Vs,teT, gp(s,t) <1, (12.27) 


and for n > | an integer jn with 


wis ET ; gj,(8,0) <2")=N,', (12.28) 
for which 
rae, (12.29) 
n>0 


Theorem 12.3.11 Consider a harmonizable infinitely divisible process and integers 
Jn € Z, n = O that satisfy the conditions (12.27) and (12.28). Then we can split the 
Lévy measure in three parts v', v?, v>, such that v!, the restriction of v to the set 
{B; |B(O)| => 2r7/}, is such that its total mass |v!| satisfies |v'!| < L and that 


/ |B(0)|dv°(B) < K }) 2", (12.30) 
n>0 
Wa) SKS By, (12.31) 
n>0 


where the distance d is given by 


d(s, t)* = i |B(s) — B(t)|7?du(B) . (12.32) 


Theorem 12.3.12 When the Lévy measure is as in Theorem 12.3.11, the process 
(X1)ter is almost surely bounded. 


Keeping in mind the representation (12.12), this is proved by considering 
separately the case of v’ for @ = 1,2,3. For € = 1, as. there are only finitely 
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many Z; since v! is of finite mass. For £ = 2 we have E >=; |Zi(0)| < oo, and 
|>0; Zi(t)| < 32; |Zi(O)| since Z; € CG. For € = 3 we use a suitable version 
of (7.23). Let us stress the content of the previous results. 


Theorem 12.3.13 For a harmonizable infinitely divisible process (X1)teT we 
have sup,er |X1| < © a.s. if and only if there exist integers jo, (jn)n>1 in Z 
satisfying (12.27) and (12.28) for which )),392"r—!" < 0. 


This theorem should be compared to Theorem 7.5.16. 


12.3.5 Example: Harmonizable p-Stable Processes 


Let us illustrate Theorem 12.3.13 in the simpler case of “harmonizable p-stable 
processes”, where | < p < 2. By definition such a process is infinitely divisible 
such that its Lévy measure v is obtained by the following construction: Starting 
with a finite measure m on G, v is the image on CG of the measure  ® m on 
R* x G under the map (x, x) — xx, where jw has density x~?~! with respect 
to Lebesgue’s measure on R*. In that case for a certain constant Cp, we have 
pj(s,t) = Cpr/?d(s,t)? for a certain distance d on T. We explore the situation 
through a sequence of exercises.’ 


Exercise 12.3.14 


(a) When 9;(s,t) = Cpri?d(s, t)? for p > | prove that there exists a sequence 
(jn) satisfying the conditions (12.27) and (12.28) as well as )°,.9 2"r7/" < 00 
if and only if yg(T, d) < oo, where gq is the conjugate exponent of p. Hint: 
Basically because yj(s,t) < 2” if and only if d(s,t) < 2"/?r-/. 

(b) When p = 1 prove that this is the case if and only if there exists a sequence 
(€n)n>0 Such that }°,.9 €n < oo and 


uds eT; d(s,t)<én})>N_'. (12.33) 


Exercise 12.3.15 In the case p = 1, prove that the condition )°, én < oo is 
equivalent to the condition Yoo(T, d) < co where the quantity yoo(T, d) is defined 
in (5.20). 


Exercise 12.3.16 Prove that the Lévy measure v of a harmonizable p-stable 
process satisfies v({B; |B(O)| => u}) < Cu~? for a constant C independent of wu. 


Exercise 12.3.17 Prove that if 1 < p < 2 then Esup,.7|X;| < oo if and only 
if yg(T,d) < oo. Prove that if p = 1, then sup,c7 |X;| < 06 a.s. if and only if 
there exists a sequence (€,,) such that }°,, €, < oo and (12.33) holds. Hint: Use the 
previous exercise. 


7 The sketch of proofs of which are especially concise. 
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Exercise 12.3.18 In the case of harmonizable 1-stable processes, prove the estimate 
P(sup,er |X:| => u) < C/u for a number C independent of u. Hint: Use 
Exercise 12.3.16. 


12.4 Proofs: The Bracketing Theorem 


Proof of Theorem 12.3.7 The plan is to use Theorem 9.4.1. Given u > 1, 
Lemma 12.1.3 (b) implies that for each n and each A € A, we have 


P( So Mha(Ziy AV < 2") > 1 —exp(—u2"") . 
i>l 
Consequently, the r.v. U defined as 


U =sup{2" 27744 (Z;))? A1); n> 0,A€ Ay} 


satisfies P(U > u) < el exp(—u2"+!) < Lexp(—u) for u > L. In particular 
EU < L. We observe the fundamental fact: if s, t € A then 


ier = 2.0 P Als yew ey Ape, 


i=l i>l 


and therefore using (9.44) with p = | and u = 4U, we obtain 


Es sup | > eZ | < KUS+ K Chr (Zi \pp(z or) + (12.34) 
P > 


tel j>1 i>1 


where K depends on r only. Since EU < L, taking expectation yields 


E sup | > ei Zi(t)| < KS+ KEY Ar (Zi) Yapp¢z21-H0) « (12.35) 
teT i>] i>1 
Now (12.7) yields 


E> hrZi)lo zis) = ; Ar (B) on p(py>r-i0'7) AV (B) , 


i=l 


and (12.24) proves that this quantity is < S$. Combining with (12.35) proves that 
E supjer |Xr| < KS. o 
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We have imposed condition (12.24) in order to get a clean statement. Its use is 
simply to control the size of the last term in (12.34). This hypothesis is absolutely 
inessential: this term is a.s. finite because the sum contains only finitely many non- 
zero terms. Its size can then be controlled in specific cases through specific methods. 

We refer the reader to [132] where it is shown how to deduce recent results of 
[63] from Theorem 12.3.7. 


12.5 Proofs: The Decomposition Theorem for Infinitely 
Divisible Processes 


The decomposition theorem for infinitely divisible processes is a close cousin of 
Theorem 11.10.3 and also ultimately relies on Theorem 9.2.1. As in the proof of 
Theorem 11.10.3, a significant level of abstraction is required, so that before we 
get into the details, it could be worth to give an outline of proof. The main idea is 
to consider the elements of T as functions on C = C’, that is, to each t € T we 
associate the function 0(t) on C given by @(t)(8) = B(t). We will then suitably 
decompose each function O(t) as a sum O(f) = 6!(t) + 62(t) of two functions 
on C, and for 7 = 1,2 we will define the process x/ as Yon, 60/4 (t)(Z;). To 
describe these processes in the language of Definition 12.2.1, for j = 1,2 let us 
define a map &/ : C > C = C! by the formula &/(B)(t) = 0/(t)(B), so that 
6/(t)(Z;) = &/(Z;)(t). Define then the positive measure v/ on C as the image of 
v under the map &/. It is simple to see that the points Z/(Z;) arise from a Poisson 
point process of intensity measure v/, the image of v under the map &/, so that v/ 
is the Lévy measure of the process (X : \rer- 

This having been spelled out, to lighten notation we consider T as space of 
functions on C by simply identifying an element t € T with the function B b A(t) 
on C, so that we write f(Z;) rather than Z; (t). 

We first prove a suitable version of Bernstein’s inequality. 


Lemma 12.5.1 Consider a function u on C. Assume that ||u||2 < co where ||u I = 
i |u(B)|2dv(B) and that ||u||oo = SUP ge gq |U(B)| < co. Then for v = 0 we have 


2 


Lx v 
P(| dAin(Zi)| > v) < 2exp( ~ pmin ae ae) : (12.36) 


Proof Leaving some convergence details to the reader®, we get 


E, expa > Eju(Zj) = | [cosh aw (Zi) = exp log cosh Au(Z;) ; 


i=1 i>1 i=l 


8 It might be a good idea here to review Exercise (6.1.2). 
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and taking expectation and using (12.8), we obtain 


Eexpa ) > eju(Zi) = exp | (cosh iu(P) — 1)dv(f). 


i=l 


Since coshx < 1+ Lx? for |x| < 1 it follows that for Allulloo < 1 we have 
cosh Au(B) — 1 < LA7u(B)* and then 


Eexpa )~sim(Zi) < exp (L2? / u(p)dv(B)) . 


i=l 


and as in the proof of Bernstein’s inequality (Lemma 4.5.6), this implies (12.36). 
oO 


Proof of Theorem 12.3.3 It follows from (12.36) that for s,t € T and v > 0, we 
have 


2 


1. 
P(| 2 i(s(2i) — t(Zj))| = v) < exp ( = rng 2’ Pera 5)) ; 


where the distance d2 and d, are those defined in (12.17) and (12.18), so 
that (12.19) follows from Theorem 4.5.13. oO 


The next result is in the spirit of the Giné-Zinn theorem (Sect. 11.8.) 
Theorem 12.5.2 We have 


Esup |X|; < sup | |t(B)|dv(B) + 4E sup |X;| . 
teT teT teT 


Proof Consider a subset A C C with v(A) < oo. Consider an independent sequence 
(Y;)i<n of r.v.s which are distributed according to the probability P on A given by 
P(B) = v(BN A)/v(A). Consider an independent sequence (¢;);>; of Bernoulli 
r.v.s which is independent of the sequence (Y;). We apply (11.37) so that E|t(Y;)| = 
v(A)7! iv |t(6)|d(B)v, and we obtain 


Esup )°|r(¥j)| < . (12.37) 


teT jzn 


N 
eT sup J, (Bd) + 4B up| J it) 


Consider then a Poisson point process (Z;) of intensity measure v and let N = 
card{i > 1; Z; € A}. Given N, according to Lemma 12.1.1, the rv.s Z;1,4(Z;) are 
distributed like an independent sequence (Y;);<;, where Y; is distributed according 
to the probability P above. We use (12.37) given N = card{i > 1; Z; € A} for the 


12.5 Proofs: The Decomposition Theorem for Infinitely Divisible Processes 413 


sequence (Y;);<y, and we take expectation to obtain (and since EN = v(A)): 


Esup > (ZiibatZi) < sup f e(Bylav(p) + 4€ sup| J eit Zi)La(Z 
teTJA teT 


tel js] i<N 


IA 


, (12.38) 


sup [ (r(B)lav(p) + 4E sup| > e:r(Zi) 
teT teT 


i<N 


by using Jensen’s inequality in the second line (i.e., taking expectation in the r.v.s 
€; for which 14(Z;) = O outside the supremum and the absolute value rather than 
inside). The result follows since A is arbitrary. Oo 


Let us now prepare for the proof of Theorem 12.3.5. Without loss of generality, 
we may assume that 0 € T, so that Esup,<7 |X;| < 2S by Lemma 2.2.1. 


Lemma 12.5.3 Let jo = jo(T) be as in Theorem 12.3.1. Then we have 
wreT; is It|Lpo)p)>,-i0) dv <LsS. (12.39) 
Proof Using (12.15) forn = 0 and since 0 € T we have 
/ r7J0\¢?| A ldv <4 
Q 
so that by Markov’s inequality U := {2|t| > r~/°} satisfies v(U) < 16 and 


J enermav = f It|dv . (12.40) 
2 U 


Consider the event & given by card{i > 1; Z; ¢ U} = 1. We lighten notation by 
assuming that the r.v.s Z; are numbered in such a way that Z; € U when & occurs. 
According to Lemma 12.1.1, conditionally on & the r.v. Z; is uniformly distributed 
on U, so that 


1 1 
Bey ElsZ0)I = wan fie 


Furthermore, since (Z;) is a Poisson point process of intensity v, we have P(Z) = 
v(U) exp(—v(U)) => v(U)/L so that 


i Itldv < LE1s|t(Zi)|. (12.41) 
U 
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Now, denoting by E, expectation in the r.v.s. e; only, we have, using Jensen’s 
inequality in the first inequality, 


15|t(Z1)| = Eelgleit(Z1)| < Eelg = E,ls|X;| < E.|X;| . 


> eit (Zi) 


i=l 


Taking expectation we obtain Els|t(Z1)| < E|X;| < 28, and using (12.41) 
and (12.40) concludes the proof. oO 


Proposition 12.5.4 There exists a decomposition T C T, + To, such that0 € Ty 
and 


yo(T\,d2)< KS, yi(T1,d0) < KS, (12.42) 
sup | It(B)|dv(pB)< KS. (12.43) 
teTh 


Here d2 and doo are as always the distances induced by the L” and the L® norm 
when T is seen as a space of functions on the measured space (C, v). These are the 
same distances as in (12.17) and (12.18). 


Proof We combine Theorem 12.3.1 with Theorem 9.2.1, calling 72 what is called 
To + T3 there. Lemma 12.5.3 asserts that a |t(B)|dv(B) < KS fort € T3. oO 


Proof of Theorem 12.3.5 Consider the decomposition of T provided by Proposi- 
tion 12.5.4. Combining (12.42) with Theorem 12.3.3 yields Esup,e7, X: < KS. 


Since 0 € 7), combining with Lemma 2.2.1 yields E sup,¢7, |X+| < KS. We may 
assume that 7, C T — T, simply by replacing 7 by T7 1 (T — T,). Thus 


E sup |X;| < E sup |X;|+ Esup|X;|< KS. 
T 


teT teT; te 


Combining with (12.43), Theorem 12.5.2 then implies E sup, <7, |X|; < KS. Every 
element t € T has a decomposition t = t! +4? with t! © 7; and t? € T. We 


set X} = X,1 and X? = X, to finish the proof since (12.21) is a consequence 
of (12.42). Oo 


Key Ideas to Remember 


¢ Infinitely divisible processes (symmetric, without Gaussian components) are 
standard fare of probability theory. They can be viewed as sum of certain random 
series of functions (the terms of which are not independent). 

¢ Our general results about random series of functions apply to this setting 
and considerably clarify the question of boundedness of such processes. This 
boundedness can always be witnessed by a suitable use of Bernstein’s inequality, 
or by forgetting about possible cancellations, or by a mixture of both methods. 
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12.6 Notes and Comments 


Theorem 12.3.5 (the decomposition theorem) was first proved in [112] under an 
extra technical hypothesis. Basically the same proof was given in [132]. While 
preparing the present edition, I discovered a much simpler proof (still using the 
same extra technical hypothesis) based on the Latata-Bednorz theorem. Namely, 
I proved Lemma 11.4.1, and I used the functionals F;,(A) = sup, infpea Jy (t) 
(where the supremum is taken over all probability measures jz with w(A) = 1) and 
Theorem 8.1.2 to obtain Theorem 12.3.1. The technical condition was necessary 
to prove that these functionals satisfy the appropriate growth condition. Witold 
Bednorz and Rafat Martynek [18], who had the early version of this book, combined 
the method of Lemma 11.4.1 with the use of convexity? as in Lemma 3.3.2 to 
construct the majorizing measure of Theorem 11.5.1 and to show (in a somewhat 
complicated manner) that this majorizing measure can be used to perform the appro- 
priate chaining. In this manner in [18] they proved Theorem 12.3.5 in the slightly 
weaker form where in (12.22) there is an extra term K sup,er [9 It|1po)4)>--Joy- This 
extra term was removed here using the simple Lemma 12.5.3. 


° This use of convexity goes back to Fernique [32]. 


Chapter 13 ®) 
Unfulfilled Dreams heal for 


We have made much progress on several of the dreams which were born in 
Sect. 2.12 (which the reader should review now). Some of this progress is partial; 
in Theorems 6.8.3, 11.12.1, and 12.3.5, we have shown that “chaining explains all 
the boundedness due to cancellation”. But what could we say about boundedness of 
processes where no cancellation occurs? In this chapter, we dream about this, in the 
simplest case of positive selector processes. Our goal does not vary: trying to show 
that when a process is bounded, this can be witnessed in a simple manner: using the 
union bound (through chaining), maybe taking convex hull, or some other simple 
idea (we will use positivity below). The most important material in this chapter is 
in Sect. 13.2, where the analysis leads us to a deep question of combinatorics. The 
author has spent considerable time studying it and offers a prize for its solution. 


13.1 Positive Selector Processes 


Theorem 11.12.1 reduces the study of the boundedness of selector processes to 
the study of the boundedness of positive selector processes. That is, we have to 
understand the quantity 


5*(T) = Esup > 156; (13.1) 
teT i<M 
where T is a set of sequences tf = (f;);<y with t; > 0, and where the r.v.s 4; 


are independent, P(6; = 1) = 6, and P(é; = 0) = 1 — 6. The study of positive 
selector processes is in any case fundamental, since, following the same steps as 
in Sect. 11.11, it is essentially the same problem as understanding the quantity 
E SUP FEF Vien J (Xi) when F is a class of non-negative functions. 
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It is difficult to imagine as a function of which geometrical characteristics of T 
one should evaluate the quantity (13.1). We shall discuss in Sect. 13.4 the case where 
T consists of indicator of sets, and in particular we shall give concrete examples 
which illustrate this point. 

An important feature of positive selector processes is that we can use positivity 
to construct new processes from processes we already know how to bound. To 
implement the idea, given a set T, we denote by solid T its “solid convex hull’, 
i.e., the set of sequences (s;);<, for which there exists t € conv T such that s; < t; 
for each i < M. It should be obvious that 


sup >> 45; = sup 9 44; - (13.2) 


tesolidT ; <1 teT j<y 


Taking expectation shows that 5* (solid 7) = 6*(T). Thus, to bound 6*(T) from 
above, it suffices to find a set T’ for which T C solid T’ and such that we control 
5*(T"). Recalling (2.149), we define 


S(T) = inf {S +0: i > P( y- 115) > uw) du < s| (13.3) 


teT i<M 


In Lemma 2.12.1, we proved that 5+(T) < 2S(T), so that if T C solidT7’, 
then 5*(T) < 2S(T’). Wishful thinking, supplemented by a dreadful lack of 
imagination!, leads to the following: 


Research Problem 13.1.1 Does there exist a universal constant L such that for any 
set T of sequences ¢ = (t;), t; > 0, one can find a set T’ with S(T’) < Ldé*(T) and 
T C solid T’? 


13.2 Explicitly Small Events 


Our ultimate goal should be to give a complete description of the quantity 5* (7) 
“as a function of the geometry of the metric space (7, d)”. This motivated Research 
Problem 13.1.1. At present, we simply have no clue on how to approach this 
problem, and in the rest of the chapter, we explore different directions. 

We proved that as a consequence of Theorem 2.11.9, for any Gaussian process, 
we can find a jointly Gaussian sequence (ux) such that 


{ sup |X| >LE sup |X| cm = (13.4) 
teT teT kel 


' Tn other words, we could not think of any other way to bound 6+ (T). 
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and moreover 


NIe 


Yo Pur =) < 


k>1 


The sets {uz > 1} are simple concrete witnesses that the event on the left-hand side 
of (13.4) has a probability at most 1/2.” 

Let us explore the same idea for positive selector processes. Does there exist a 
universal constant L such that for each set T of sequences t = (f;);>1, t; = 0, there 
exist “simple witnesses” that the event 


sup ) > dit) = L6*(T) (13.5) 
teT icy 


has a probability at most 1/2? 
There is a simple and natural choice for these witnesses. For a finite subset J of 
{1,..., WM}, let us consider the event H; defined by 


A, = {Vi € I, 6; = 1}, (13.6) 


so that P(H;) = 5°*¢/, The events H, play the role that the half-spaces play for 
Gaussian processes in (13.4) (see (2.153)). 


Definition 13.2.1 Given a positive number 7 > 0, an event S2 is n-small if we can 
find a family G of subsets J of {1,..., M@} with 


ye 21/2 (13.7) 
IEG 
and 
@e| | Ar. (13.8) 
IEG 


The choice of the constant 1/2 in (13.7) is rather arbitrary. Since P(H;) = 5°41, 
a 6-small event is of probability < 1/2, but it is such in an “explicit” way (hence the 
title of the section). The sets H7 as in (13.8) are “simple concrete witnesses” of that. 
The first point to make is that there exist sets of small probability which do not 
look at all like 5-small sets. A typical example is as follows. Let us consider two 
integers k,r, and r disjoint subsets /;,..., J of {1,..., M}, each of cardinality k. 


? The existence of these witnesses is a not as strong as the information provided by Theorem 2.10.1. 
It is easy to deduce it from Theorem 2.10.1, but it does not seem easy to go the other way around. 
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Let us consider the set 
A={(6j)i<w; Ve<r, die, 6=1)}. (13.9) 


It is straightforward to see that P(A) = (1 — (1 — 5)kyr . In particular, given k, one 
can choose r large so that P(A) is small. We leave the following as a teaser to the 
reader: 


Exercise 13.2.2 Prove that the set A is not 1/k-small. Hint: A carries a probability 
measure v such that v(H;) < k~ 4! for each I. 


The following asks if the event (13.5) is “explicitly small”: 


Research Problem 13.2.3 Is it true that we can find a universal constant L such 
that for any class of sequences T as in (13.5), the event 


| sup ) > dif; => L8*+(T) = LE sup siti] (13.10) 
teT jy teT je 


is 6-small? 


Even proving that the set (13.10) is ad-small, where aw is some universal constant, 
would be of interest. The main result of Sect. 13.4 is a positive answer to this 
problem when T consists of indicators of sets. 


Proposition 13.2.4 If Problem 13.1.1 has a positive answer, then so does Prob- 
lem 13.2.3. 


In view of (13.2), this proposition is an immediate consequence of the following, 
where S(T) is defined in (13.3): 


Proposition 13.2.5 For any set T, the event 


| sup ) dif; > Ls(r)| 


teT icy 


is 6-small. 


This result is very much weaker than a positive answer to Problem 13.2.3 
because we expect S(T) to by typically infinite or much larger than 5+ (T).* Thus, 
Proposition 13.2.4 achieves little more than checking that our conjectures are not 
blatantly wrong, and it seems better to refer to [132] for a proof.* 


3 Tt would be an astonishing fact if it were true that S(T) < L5*(T), and proving it would be a 
sensational result. 

4 We do not reproduce this proof here because it uses the rather complicated Theorem 11.1 of 
[131], and we hope that a creative reader will invent a better argument. 
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Problem 13.2.3 motivates a more general question, which the author believes to be 
of fundamental importance.° It is considerably easier to explain this question if we 
identify {0, 1}” with the class M of subsets of M* := {1,..., M} in the obvious 
manner, identifying a point (x;)i<u € {0, i” with the set {i < M; x; = 1}. We do 
this throughout this section. 

We first explain the central combinatorial definition. 


Definition 13.3.1 Given a class D C M and an integer q, we define the class 
D® Cc Mas the class of subsets of M* which are not included in the union of any 
q subsets of M* belonging to D. 


It is useful to think of the points of D@ as being “far from D”. To make sure 
you understand the definition, convince yourself that if for an integer k we have 
D={J CM"; cardJ <k}, (13.11) 
then 
DY ={J CM*; cardJ >kq4+1}. (13.12) 
Given 0 < 6 < 1 and the corresponding independent sequence (6;)j;<m, let us 


denote by Ps the law of the random set {i < M; 6; = 1} e M. 


Research Problem 13.3.2 Prove (or disprove) that there exists an integer g with 
the following property. Consider any value of 5, any value of M, and any subset D 
of M with Ps(D) > 1 — 1/q. Then the set D is 5-small. 


In other words, we simply look for € > 0 small and q large such that D) is 
6-small whenever P3(D) > 1 —e«. 

To understand this problem, it helps to analyze the example (13.11). Then the set 
H of (13.6) is now described by H; = {J € M; 1 C J}. According to (13.12), we 
have 


DDC U H, , 
1eG 


where G = {J € M; card] = kq + 1}. Thus, using the elementary inequality 


(;) < (“)’ (13.13) 


5 Far more so than Problem 13.2.3 itself 
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we obtain 


M M6 \kqti 
Soot! = ( Jaton < (——) ee (13.14) 
a kq+1 kq+1 


It is elementary to show that when Ps(D) > 1/2, one has k > 5M/L. It then follows 
that if g is a large enough universal constant, the right-hand side of (13.14) is < 1/2. 
That is, we have proved that sets D of the type (13.11) have the property described 
in Problem 13.3.2 that D is 5-small. 

To believe that Problem 13.3.2 has a positive solution, one has to believe that the 
simple case above is “extremal”, i.e., “the worst possible”. It might be possible to 
provide a negative solution to Problem 13.3.2 in a few lines: it “suffices” to invent a 
new type of set D to solve it negatively! 

A solution to Problem 13.3.2 will be rewarded by a $1000 prize, even if it applies 
only to sufficiently small values of 6. It seems probable that progress on this problem 
requires methods unrelated to those of this book. A simple positive result in the right 
direction is provided in the next section. 


Proposition 13.3.3 A positive solution to Problem 13.3.2 implies a positive solu- 
tion to Problem 13.2.3. 


Proof Let q be as provided by the positive solution of Problem 13.3.2. It follows 
from Markov’s inequality that the event {supjep ));-y biti < q5*(T)} has 
probability > 1 — 1/q which in our current language means that Ps3(D) > 1 — 1/q 
where D = {J € M; super Vijesti < 95*(T)}. Now, if J',..., 4 © D and 
J = Ureq J’, it is obvious that super Dye) ti < g75* (T). Consequently, 


[J eM; sup) 4 > @5*(7)| c Do (13.15) 
fe" ied 
and the positive solution of Problem 13.3.2 asserts that this set is 6-small. Oo 


The author has spent considerable energy on Problem 13.3.2. It would not be 
realistic to attempt to convey the depth of this problem in a few pages. A sequence 
of conjectures of increasing strength, of which a positive answer to (a weak version 
of) Problem 13.3.2 is the weakest, can be found in [131]. 


13.4 Classes of Sets 


In this section, we consider positive selector processes in the simpler case where T 
consists of indicators of sets. Please be careful that the notation does not coincide 
with that of the previous section. It is the elements of T which are now identified 
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to points of M. Considering a class 7 of subsets of M* = {1,..., M}, we try to 
bound the quantity 


6(7) := E sup acy 


JET ieJ 


The main result of this section is Theorem 13.4.4. Before we come to it, we explore 
a few naive ways to bound 6(,7). 


Proposition 13.4.1 Assume that for some number S > 0, we have 


Sy (-“=“)" 1. (13.16) 


JET 


Then (7) < LS. 


Proof We first observe that by (13.16), each term in the summation is < 1, so that 
dcard J < S whenever J € J, and thus u > 6é6 card J whenever u > 6S. We then 
simply use Lemma 11.12.3 to obtain that for u > 6S, we have, using (11.70) in the 
second inequality, 


P( sup S28 =u) = P(Sos eu) sy (AY. 


JET jes JET ieJ JET 


To finish the proof, it is enough to integrate in u the previous inequality and to 
use (13.16) and simple estimates. oO 


For a class J of sets, let us define Ss;(7/) as the infimum of the numbers S for 
which (13.16) holds. Thus, the inequality 6(7) < LS implies 
5(7) < LSs(7J) . (13.17) 


Exercise 13.4.2 Prove that the inequality (13.17) cannot be reversed. That is, given 
A > 0, construct a class 7 of sets for which Ad(.7) < Ss3(7). Hint: Consider many 
disjoint sets of the same cardinality. 


Given a class 7 of sets and two integers n and m, let us define the class 7(n, m) 
as follows: 


VIE T(n,m), Ny... In ET: card (J \ (J Je) <m. (13.18) 


l<n 


Then for each realization of the r.v.s (6;), one has 


Yogi <mt> 05 


ieJ l<nicede 
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and consequently 
d(J(n,m)) <nd(J)+m. (13.19) 
Combining (13.19) and (13.17), one obtains 
d(J(n,m)) < LnS3(J) +m . (13.20) 
In particular, taking n = 1, for two classes Z and 7 of sets, one has 
ZC I (1,m) = 62) < LSs(J) +m, 
and thus 
6(Z) < Linf(S3(J) +m; ZC T(U,m)}, (13.21) 


where the infimum is over all classes of sets 7 and all m for which Z C J/(1,m). 
The following (very) challenging exercise disproves a most unfortunate conjecture 
stated in [130] and [131], which overlooked the possibility of taking n > 2 
in (13.20): 


Exercise 13.4.3 Using the case n = 2, m = O of (13.20), prove that the 
inequality (13.21) cannot be reversed. That is, given A > O (however large), 
construct a class of sets Z such that Ad(Z) < Ss(7) +m for each class of sets 
J and each m for which Z Cc J(1,m). 


In words, we can prove that (13.21) cannot be reversed because we have found a 
genuinely different way to bound 6(Z), namely, (13.20) for n = 2. 

In the same line as Exercise 13.4.3, it would seem worth investigating whether 
given a number A we can construct a class of sets Z such that Ad(Z) < nS3(J7)+m 
whenever Z C 7 (n,m). This seems plausible, because we have a (seemingly) more 
general way to bound 6(Z) than (13.19), namely, the “solid convex hull” method of 
Sect. 13.1. 

In the remainder of this section, we prove the following: 


Theorem 13.4.4 ((130]) For any class J of subsets of M*, the event (13.10) 


{ sup )>4 = Lar) 


JET jes 


is 6-small. 


That is, Problem 13.2.3 has a positive solution when T consists of indicators of 
sets. This result is a simple consequence of the following: 
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Proposition 13.4.5 Consider a class J of subsets of M* and an integer n. If the 
event 


| sup ¥ 4; > n| (13.22) 
ae 
is not 5-small, then 

6(7) =n/Lo . (13.23) 


Proof of Theorem 13.4.4 Considering a class 7 of subsets of M*, we will prove 
that the event 


6; > 3L06(7) (13.24) 
aaa 


is 6-small. Assume for contradiction that this is not the case. Then forn < 3L95(.7), 
the larger event (13.22) is not 6-small, and (13.23) shows that n < Lod(7). Thus, 
whenever n < 3L96(.7), we also have n < Lod(7). This means that there is no 
integer in the interval ]L04(.7), 3L06(.7)], so that this interval is of length < 1, 1.e., 
2L05(.7) < 1. Thus, (13.23) fails for n = 1, so that by Proposition 13.4.5 again the 
event {supy.7 ies 6; = VW = {supyez ies 6; > O} is 6-small, and the smaller 
event (13.24) is 6-small. This contradiction finishes the proof. oO 


We start the proof of Proposition 13.4.5. We fix n once and for all, and we define 
J ={('eEM; cad =n, Je TJ, IS' CSI}. (13.25) 


We observe that 


| sup ) 4; = n| = | sup ) 3; > n| (13.26) 


JET jes JES" jes 


For an integer 1 < k <n, we set 


4end\k 
= ) (13.27) 


dee 2( 
Lemma 13.4.6 Assume that the event (13.26) is not 6-small. Then there exists a 
probability measure v on J’ with the following property: For each set A C M* 


with 1 < card A < n, we have 


vq{J € J’; AC J}) <d(cardA). (13.28) 


426 13. Unfulfilled Dreams 


Proof For such a set A, consider the function f4 on 7’ given by 


fa()) = 


1 
——-1 : 
d(card Ay (4073 
The main argument is to prove that any convex combination of functions of the type 
fa takes at least one value < 1 (we will then appeal to the Hahn-Banach theorem). 
Suppose, for contradiction, that this is not the case, so that there exist coefficients 
a4 > 0 of sum 1 for which 


vieT', \aafaV) = yea (13.29) 
A 


ACJ 


For 1 < k <n, let G, be the collection of all the sets A for which card A = k and 
a4 > 2*+15*. Since a a4 = 1, we have card G, < 6~*2-*—!, and thus 


1 
Siok cardG, < me (13.30) 
k>1 

We claim that 

Wie’: tk<n, IAC: ACI. (13.31) 


Indeed, otherwise, we can find J € 7’ for which 
ACJ, cadA=k,k<n>a4 < 2**15 
and thus, using the definition of d(k) and (13.13), 


aA n gk+1 gk 
2. Hes =), (7) qm =! 


ACJ 1<k<n 


This contradicts (13.29) and proves (13.31). 

To conclude the argument, we consider G = LJyepen Gx. Consider (6;)j<u 
such that }7,., 6; => n for some J € J’. Then (13.31) proves that J contains 
a set A € G, so that (6;)i;<m ©€ Ha, and we have shown that the event 
{sup ye7/ Dies 8 > n} in the right of (13.26) is contained in UsegH,. Now 
Daeg gon oe 1/2 from (13.30). Thus, this event is 6-small. Using (13.26), we 
obtain that the event (13.22) is 6-small, a contradiction which proves that (13.29) is 
impossible. 

We have proved that the convex hull C of the functions of the type f4 is disjoint 
from the set 2/ of functions which are everywhere > 1. The set / is open and convex. 
The Hahn-Banach theorem asserts that we can separate the convex sets C and U by 
a linear functional. That is, there exist such a functional g on R7 and a number a 
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such that y(f) <a < g(g) for f ¢ C and g € U. For g € U andh > 0, we have 
g+Ah €U foreach dr > 0. Thus, g(g + Ah) > a, and hence, g(h) > 0. Thus, g 
is positive. We can then assume that 9 is given by a probability measure v on 7’, 
g(f) = i f(J)dv(J). Since a < g(g) whenever g is aconstant > 1, we geta < 1. 
Thus, f f(J)dv(J) < 1 for each f € C and in particular for f of the type f4. We 
have proved (13.28). oO 


Lemma 13.4.7 Assume that the event (13.26) is not 6-small. Then this event has a 
probability > exp(—Ln). 


Proof Consider the probability v on the set 7’ of (13.25) as in (13.28) and the r.v. 
(depending on the random input (4;);<) 


YoviJe TJ; VieJ, 6 =1})=v({J; 6) € Hy}) 


J reonennav 


Obviously, the event (13.26) contains the event Y > 0. The plan is to use the Paley- 
Zygmund inequality in the weak form 


(EY)* 
P(Y > 0) > ; 13.32 
(¥ >0)> = (13.32) 
which is a simple consequence of the Cauchy-Schwarz inequality. First, 
EY = E| 13;)endv(J) = id P(H,7)dv(J) = 6" > (13.33) 


since v is supported by .7’ and card J =n for J € 7’. Next, 


¥? =v (J, J’); (8:) € Ay, (5) € Hy}) 
=v", J); (i) € Hy N Ay}), 


so that, proceeding as in (13.33), and since P((6;) € Hy N Hy) = goard( JUL’) 
Ey2 — [ 8eePavcnavs : (13.34) 


Now, the choice A = JN J’ shows that 


scard(JUJ') 2 » Femme (0 er 
ACJ 
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and therefore, using (13.28) in the second line and again (13.13) in the last line, 


fomeePavea < >  amiculucae’| CW ear: Wag a > 
AcJ 


< (eka 


O<k<n 


< 25" (Sy. (13.35) 


O<k<n 


An elementary computation shows that the last term dominates in the sum, so that 
the right-hand side of (13.35) is less than < 62" exp Ln, and recalling (13.34), this 
proves that Ey? < exp(Ln)(EY)* and completes the proof using (13.32). oO 


Proof of Proposition 13.4.5 Consider the rv. X = supje7 doje, 5i- We assume 
that the event {X > n} is not 6-small. Lemma 13.4.7 implies that 


P(X >n) > exp(—Ljn) . (13.36) 
From this fact alone, we shall bound from below 6(7) = EX. Using Markov’s 
inequality, we know that P(D) > 1/2, where D = {X < 26(7)}. Recalling the 
set D@ of Definition 13.3.1, given two integers g and k > 0, we define similarly 
D‘@®) as the set of subsets J of M* which have the property that whenever one 
considers J!,..., J4 € D, then 
card (J \ Urcg J‘) =k +1. 
Thus, D@° = D™, and as in (13.15), one proves that 


{X > 2q8(7) +k} c DO | (13.37) 


The heart of the matter is Theorem 3.1.1 of [121] which asserts that 
q 
P(DG®) < = 
= oF 
Comparing with (13.36) and (13.37) then yields 
24 
2q5(J) +k <n = exp(-—Lin)< =. 
q 


Let us fix g with g > exp(2L}), so that g is now a universal constant. If 2¢5(7) => 
n, we have proved that 5(7) > n/L so we may assume that 2¢5(7) < n. Let 
us consider k > O with 2g6(7) +k < n. Then exp(—Ljn) < 22 /g* so that 
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exp(—Lin) < 24 exp(—2L 1k), and thus 2k —n < L, so thatk < n/2+L. So, 
we have proved that k < n/2 + L whenever k < n — 2q5(7). Consequently n — 
2q6(J) < n/2+ L + 1, and thus 6(7) = (n — Lo)/Lo. We have proved that 
6(7) =>n/L when n > 2Lo. This finishes the proof in that case. 

Finally, we finish the proof when n < 2Lo. Since our assumption is that the event 
X > nis not 6-small, the larger event {X > 1} is not d-small. Consider the union 
I of all the elements of 7, and let m = card/. Then {X > 1} C Uje7 Hi; so that 
since this event is not small, we have dcard/ > 1/2. Soif & = {di € 7,46; = 1}, 
then P(Z) = 1— (1 —4)” > 1/L. Now X > 1g so that taking expectation, we get 
5(J) = P(E) = 1/L. o 


Part III 
Practicing 


Chapter 14 ® 
Empirical Processes, II crests 


The reader should review Sect.6.8 where we started the study of empirical 
processes. Empirical processes are a vast topic, but here our goal is pretty limited. 
In Sect. 14.1, we prove a “bracketing theorem” to illustrate again the power of the 
methods of Sect. 9.4. In Sects. 14.2 and 14.3, we prove two specific results, which 
illustrate in particular that Proposition 6.8.2 performs no miracle: it is the part 
“without cancellation” which requires work and for which one must use a specific 
method in each case. 

We denote by (Xj) an i.i.d. sequence of r.v.s valued in a measure space ({2, /), 
i being the common law of the (X;). 


14.1 Bracketing 


Theorem 14.1.1 Consider a countable class F of functions in L? (1) with 0 € F. 
Consider an admissible sequence (An) of partitions of F. For A € An, define the 
function ha by 


ha(@) = sup |f(o) — f'(@)|. (14.1) 
SL flea 


Consider an integer N > 1. Assume that for a certain jo = jo(F) € Z, we have 


2- Jo 
|hFll2 S 


(14.2) 


a 
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Assume that for eachn > 1 and each A € An, we are given a number jn(A) € Z 
with 


, Qn 
(274A A?) A Id < 7 (14.3) 
and let 
S= sup yore) , (14.4) 
feF n>0 


where An(f) denotes the unique element of An which contains f. Then 


Sn (F) = E sup | )0(f (Xi) — w(f))| < LS. (14.5) 


JEL j<N 


It is instructive to rewrite (14.2) as f 27I0F hed < 1/N in order to compare 
it with (14.3). The normalization used in this theorem is not intuitive, but should be 
clearer after you study the proof of the next corollary. 


Corollary 14.1.2 With the notation of Theorem 14.1.1, define now 


S* = sup ) 2" Ilha, cplla - (14.6) 
feF 70 
Then 
Sy(F) < LVNS* . (14.7) 


Since A(A) < ||ha|l2, we have y2(F, dz) < S*; but it is not true in general that 
Sy(F) < LVNy(F, a2). 


Proof This is routine. Define j,(A) as the largest integer j for which ||ha|l2 < 
2/24 /./N, so that 2"/2—Jn(A) < 2,/N||h4||2, and consequently, 


SY 2th) < WN D2" Ilha, cll - 


n>0 n>0 


Since ||ha|l2 < 2"/2-/n(A) /./N, (14.3) holds, and the result follows from Theo- 
rem 14.1.1. oO 


Exercise 14.1.3 Given two (measurable) functions f; < fo, define the bracket 
[ fi, f2] as the set of functions {f; fi < f < fo}. Given a class F of functions 
and « > 0, define N;j(F,¢€) as the smallest number of brackets [f1, f2] with 
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\|.f2 — fill2 < € which can cover F. Use Corollary 14.1.2 to prove that 


Sn(F) < LN | Jlog MN; \(F, ede . (14.8) 
0 


Inequality (14.8) is known as Ossiander’s bracketing theorem [77], and (14.7) is 
simply the “generic chaining version” of it. The proof of Ossiander’s bracketing 
theorem requires a tricky idea beyond the ideas of Dudley’s bound. In our 
approach, Ossiander’s bracketing theorem is a straightforward consequence of 
Theorem 14.1.1, itself a straightforward consequence of Theorem 9.4.1. None of 
the simple arguments there involves chaining. All the work involving chaining 
has already been performed in Theorem 9.4.1. As suggested in Sect. 6.9, in some 
sense, Theorem 9.4.1 succeeds in extending Ossiander’s bracketing theorem to a 
considerably more general setting. 


Proof of Theorem 14.1.1 Let us fix A © Ay, and consider the rv.s W; = 


(2740) 4(X;)*) A 1, so that by (14.3), we have }°;-y EW; < 2”. Consider a 
parameter u > 1. Then Lemma 7.7.2 (b) yields 


P( y wiz ae) < exp(—u2"*!) , (14.9) 
i<N 


Consider the event {2 (u) defined by 


Yn >0, VAEA,, See nae eur. (14.10) 
i<N 


so that (14.9) and the union bound yield P(2(u)) > 1— L exp(—u). Let us consider 
independent Bernoulli r.v.s ¢;, which are independent of the X;, and let us recall that 
E, denotes expectation in the r.v.s €; only. Given the r.v.s X;, we consider the set 
T of all sequences of the type (xi)1<i<w = (f(Xi))1<i<n for f € F. To bound 
E sup, er | )j<n €ixil, we appeal to Theorem 9.4.1. Also, since | f (X;)— f/(Xi)| < 
ha(X;) for f, f’ € A, (9.42) (with 4u rather than wu) follows from (14.10). Finally, 
for f € F, we have | f(X;)| < hx (Xj), so that 


IFAD Noy pexpie2-H0P} < AF(Xi pon p(x, >2-i0F)} . 


We then use (9.44) with p = | to obtain 
E. sup | )) e f(X;)| < Lu sup 52" 4s) 
JES i<N SEF n>0 


+L > AF(Xi) on -(x,)>2-i0F)} : (14.11) 
i<N 
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The expectation of the last term is LN f NFA 9), ->2- iF) AH. Now, since h1j_>y} < 
h/v, and using (14.2) in the last inequality, 

Nf AeA py p22-Wory ho < N27 )+1 i ned 270 
Consequently, taking expectation in (14.11) and using that P(Q(u)) => 1 — 
Lexp(—u), we obtain 

E sup | Y > ei f(Xi)| < L sup De a aa =LS, 
i<N FEF n>0 
and we conclude the proof using Lemma 11.8.4. Oo 


14.2 The Class of Squares of a Given Class 


It is beyond the scope of this book to cover the theory of empirical processes (even 
restricted to its applications to Analysis and Banach Space theory). In the rest of 
this chapter, we give two sample results, which are facets of the following problem. 
Consider independent r.v.s X; valued in R”. Denoting by (-, -) the canonical duality 
of R” with itself, and T a subset of R”, we are interested in bounding the quantity 


sup | 5° ((Xi, t)? — E(X;,t))| . (14.12) 


teT i<N 


As a warm-up, we recommend that the reader studies the following exercise. The 
results there are often needed. 


Exercise 14.2.1 Given a probability 1, for a measurable function f, we define the 
following two norms (Orlicz norms) 


lf| 
=infjA>0,; —)du <2 14.1 
Ilmy = int [A> 0: f exp(S)au = 2} (14.13) 
and 
f* 
Ifllye = int {A> 05 f exp(4)du <2}. (14.14) 
(a) Prove that if k > 1 
[evitian <2 => Ilfllu <k- (14.15) 


Hint: Use Hélder’s inequality. 
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(b) Prove that 
Vu>0, P(\f| =u) < 2exp(—u) = IIfllw < L 
and 


Vu>0, P(f| =u) < 2exp(—u?) > IIflly <L.- 


(c) Prove that 


If lly. S LF lly (14.16) 


and 


fi folly SW fillyell ally - (14.17) 


(d) On a rainy day, obtain a completely uninteresting and useless result by 
computing the exact value of ||g||y, where g is a standard Gaussian r.v. 

(e) If (¢;) denote independent Bernoulli r.v.s and (a;) denote real numbers, prove 
that 


| cael, < u(Yoa?) (14.18) 
i 


i 


Hint: Use the sub-Gaussian inequality (6.1.1). 
(f) Prove that if the r.v.s Y; are independent and centered, then for v > 0, it holds 
that 


1 v2 Vv 
P( y= 0) sexp(-Lmin(———__, __"_)). 
2 | L ies Willy, maxj<n |lYilly, 
(14.19) 


Hint: Prove that for |A|||¥ lly, < 1/2, we have EexpaY < exp(a7||Y i, FEY, 
and copy the proof of Bernstein’s inequality. 
(g) Prove that if the r.v.s Y; are independent and centered, then for v > 0, it holds 


2 


VU 
02 Y¥;> ») < exp ( = TE ca (14.20) 


i=l 


The tail inequalities (14.19) and (14.20) motivate the use of the distances dy, and 
dy, associated with the norms || - || y, and || - |ly.- 

As in the previous section, we consider a probability space (2, jz), and we denote 
by (Xj i<w r.v.s valued in Q2 of law jz. We recall the norm || - ||y, of (14.13) and the 
associated distance dy,,. Before we come to our main result, we prove a simpler but 
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connected fact. We consider independent Bernoulli r.v.s ¢; independent of the r.v.s 
Xj. 

Theorem 14.2.2. Consider a (countable) class of functions F with 0 € F. Then for 


any integer N, 


E sup | )> ei f(Xi)| < LVN 2(F, dy,) + Ly (F, dy) . (14.21) 


IES <n 
To understand this statement, we observe that y2(F, dy,) < vi(F, dy,), but that 
the factor VN is in front of the smaller term y2(F, dy,). 


Exercise 14.2.3. After studying the proof of Theorem 14.2.2, produce a decompo- 
sition of F as in Theorem 6.8.3, where Sy (F) is replaced by L/Ny2(F, dy,) + 
Ly (F, dy,). 


We start the proof of Theorem 14.2.2 with a simple fact. 
Lemma 14.2.4 [fu > 1, then 


P( So 1s)! = UNI Fly) < expN). (14.22) 


i<N 
Proof Replacing f by f/||f\ly,, we may assume || f||y, = 1. Then 
Eexp > |f(Xi)| <2" <e™. 
i<N 


Using the inequality P(x > v) < exp(—v)EexpX yields PQ); <y |f(Xi)| = 
N+ 1)) < exp(—uN). oO 


We consider an admissible sequence (A,,) of partitions of F such that 


VG EF, Yi2"PA(An(f) dy) S Lya(F, dys) (14.23) 
n>0 

VA EF, YS 2"A(An(f), dy) < LY (F, dy) - (14.24) 
n>0 


For each A € An, we choose a point f,,4 € A. We will lighten notation by writing 
fa rather than f,,4. For A € A, withn > 1, we denote by A’ the unique element of 
An—1 that contains A. This defines as usual a chaining in F, by choosing m,(f) = 
fa where A = A,(f). 

We denote by 7; the largest integer with 2”! < N, so that N < 27'+1, 
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Lemma 14.2.5 Consider a parameter u > 1 and the event Q2(u) defined by the 
following conditions: 


Vn, l<n<ny,VAEA,, 


| So ei(fa(Xi) — far (Xi))| < Lu2"?VN AA, dy). (14.25) 


i<N 


¥n>m,VAE An, >>| fa(Xi) — fa (Xi) < Lu2"A(A dy). (14.26) 
i<N 


Then 
P(Q(u)) = 1 — Lexp(—u) . (14.27) 


Proof Thet.v. Y; = ¢;(f4(Xi) — f4’(Xj)) is centered, and since f4 and f,4’ belong 
to A’, we have ||Yi lly, < A(A’, dy,). We use (14.19) to obtain that for any v > 0, 


1 . sv? 
P( 2 Y;| > vA(A'sdy,)) < 2exp ( - 7 nn (+ :)) : (14.28) 


i<N 


Since n < nj, we have /N > 2”/2. For u > 1, setting v = Lu2"/*./N, then 
v?/N > Lu2" and v > Lu2". Thus, (14.28) implies that the inequality in (14.25) 
occurs with probability > 1 — L exp(—2u2"). 

Next, since || f4 — farlly, < A(A’,dy,) and u2"/N > 1 forn > ny, using 
Lemma 14.2.4 for 2u2”/N rather than u implies as desired that the right-hand side 
of (14.26) occurs with probability > 1 — L exp(—2u2"). 

Since card A, < Ny = 22” and since see a exp(—2u2”) < Lexp(—u), 
(14.27) follows from the union bound. oO 


Proof of Theorem 14.2.2 We prove that on the event 2(u) of Lemma 14.2.5, we 
have 


sup | > ei f(Xi)| < Lu NyalF, dy,) + n(F, dy) » (14.29) 
JEL i<NnN 


which by taking expectation and using (14.27) implies the result. To prove (14.29), 
since 0 € F, we may assume that ( f) = 0. We deduce from (14.25) that for each 
n with 1 <n < ny, one has 


| 5 ei tn (f)(Xi) — tn f (Xi) S Lu2"?VNA(An-1(f), dy) - (14.30) 


i<N 
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For n > nj, we use that from (14.26), we have 


| do e4 Gn FX) = tn PUK) SYS [ta AK = tna P(Xi)| 


i<N i<N 
< u2" A(An_1(f), dy) - (14.31) 


Summation of these inequalities together with (14.23) and (14.24) concludes the 
proof. Oo 


We now come to the main result of this section. We recall the norm || - ||y, 
of (14.14). We denote by dy, the associated distance. 


Theorem 14.2.6 ([41,73]) Consider a (countable) class of functions F withO € F. 
Assume that 


VfeEF, Wfly <A". (14.32) 
Then for any integer N, 


E sup | )\(f (Xi) — Ef?)| s LVN A* AF, dy) + Lya(F. dy)’. (14.33) 


fe i<N 


The point of the theorem is that we use information on the class F to bound the 
empirical process on the class F* = {f?; f € F}. This theorem does not follow 
from Theorem 14.2.2 applied to the class F* = { f?; f € F}. As we will show, it is 
true that y2(F?, dy,) < A*y2(F, dy,), so that the first term in the right-hand side 
of (14.33) is really the same as in the right-hand side of (14.21) but the functional 
y no longer occurs in (14.33). 

As an example of relevant situation, let us consider the case where §2 = R” and 
where y is the canonical Gaussian measure on R”, i.e., the law of an independent 
sequence (g;)i<m Of standard Gaussian r.v.s. Recalling that (-,-) denotes the 
canonical duality between R” and itself, for any t € R”, we have 


/ (t, x)2du(x) = It, (14.34) 


where ||f||2 denotes the Euclidean norm of t. In words, jz is “isotropic”. Thus, if 
X; has law p, then E(t, X;)? = (It 13. Consider a subset T of IR”, which is seen 
as a set F of functions on §2 through the canonical duality (-, -). The left-hand side 
of (14.33) is then simply 


Esup| ((t, Xi)? — [itll3)] . (14.35) 
teT i<N 
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A bound for this quantity is relevant in particular to the problem of signal 
reconstruction, i.e., of (approximately) finding the transmitted signal t ¢ T when 
observing only the data ((t, X;))j<x; see [73] for details.! In these applications, 
one does not like to have 0 € F, but one assumes instead that F is symmetric (i.e., 
—f €F if f € F). It is simple to show that (14.33) still holds in this case. (Let 
us also observe that (14.33) does not hold when F is reduced to a single non-zero 
function.) 

Now, what is a possible strategy to prove Theorem 14.2.6? First, rather than the 
left-hand side of (14.33), we shall bound E SUP feF | oien ei f (Xi)"|, where (¢€;) 
are independent Bernoulli r.v.s, independent of the r.v.s (X;) and use Lemma 11.8.4. 
We have to bound the empirical process on the class F* = {f?; f € F}. There is 
a natural chaining (zt, (f)) on F, witnessing the value of y2(F, dy,). There simply 
seems to be no other way than to use the chaining (71,(f )*) on F? and to control 
the “increments along the chain”: 


Yo etn f)(Xi)? — ttn (f)(Xi)”) - 


i<N 


It seems unavoidable that we will have to control some of the quantities 
tn (f)(X;)* _ Tn—1(f)(Xi)" |. Our hypotheses on ¥ do not yield naturally a 
control of these quantities, but rather of the quantities |, (f)(Xi) — mn—1(f) (Xi). 
Since 


tn f)(Xi)> — Hn (f(Xi)” 
= (nf) Xi) — Tn-1 (Ff) (Xi) tn P(X) + Tn-1 (Xi) 
it seems impossible to achieve anything unless we have some additional control of 
the sequence (7,(f)(Xi) + Mn—1(f)(Xi))i<n, Which most likely means that we 


must gain some control of the sequence (f(X;))j<w for all f € F. Indeed, a key 
step of the proof will be to show that 


E sup ‘> rx)” < L(WNA* + r(F, dy,)) - (14.36) 
SEF i<N 


We now prepare the proof of Theorem 14.2.6. We consider an admissible 
sequence (A,,) of partitions of F such that 


VA EF, > 2"PA(An(f), dyn) S 272F, dp) - (14.37) 


n>0 


' A few words about this general direction may be found in Sect. D.6. 
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For each A € An, we choose a point f,,4 € A. We will lighten notation by writing 
fa rather than f,,4. For A € A, withn > 1, we denote by A’ the unique element of 
An—1 that contains A. This defines as usual a chaining in F, by choosing m,(f) = 
fa where A = A,(f). 

We consider Bernoulli r.v.s (€;) independent of the r.v.s (X;). We denote by n; 
the largest integer with 2”! < N, so that N < 27'+1, 


Lemma 14.2.7 Consider a parameter u > 1 and the event 2(u) defined by the 
following conditions: 


Vn, 1l<n<n,,VAEA,, 


| do ei(fa(Xi)? = far(Xi)?)| < Lu"? VN A*A(A dy). (14.38) 
i<N 


Vn>n,VAEA,, So fa(Xi) = far (Xi)? < Lu2"A(A', dy)? . (14.39) 


i<N 
VAE An. >> fa(Xi)’ < Lun A” . (14.40) 
i<N 
Then 
P(Q(u)) > 1— Lexp(—u) . (14.41) 


Proof Let us first study (14.38). By (14.17), we have 


fi — Folly S fa — farllvell fa + farlly, S ACA’ dy) x 2A". 


Consequently, the rv. Yj; = ej(fa(Xi)* — far(Xi)7) is centered and ||Y¥illy, < 
2A* A(A’, dy,). We prove that the inequality in (14.38) occurs with probability > 
1 — Lexp(—2u2") just as in the case of (14.25). 

Let us turn to the study of (14.39). It is obvious from the definition (or 
from (14.17)) that || f7llu, < Il alee so the function f = (f4 — fa’) satisfies 


If lly < Ifa - tally, < A(A’, dyy)*. Also u2"/N > 1 forn > ny. Using 
Lemma 14.2.4 for 2u2”/N rather than u implies as desired that the right-hand side 
of (14.39) occurs with probability > 1 — L exp(—2u2"). 

Using again Lemma 14.2.4, and since || filly, < || fallin < A* by (14.32), 
we obtain that for any A € A,,, inequality (14.40) holds with probability > 1 — 
Lexp(—2Nu). 

Finally, we use the union bound. Since card.A, < N, = 22" and in particular 
card A,, < Nn, < 2%, and since Vn>0 2?" exp(—2u2”) < Lexp(—uw), the result 
follows. 7 ia 
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We consider the random norm W(/) given by 
1/2 
wn=(o ren’) (14.42) 
i<N 
Lemma 14.2.8 On the event 82(u), we have 
VfEeF, Wf) <LVu(VNA* + y2(F, dy) « (14.43) 


Proof Given f € F, we denote by z,(f) the element f4 where A = An(f). We 
also observe that A,—1(f) is the unique element A’ in A,—; which contains A. 
Writing f = m,(f) + are (1n(f) — mn-1(f)), using the triangle inequality for 
W implies 


W(F) < Wt (f)) + D5 Wtf) — tai (f)) - 


n>n\ 


Next, (14.40) implies W(7tn,(f)) < LV NuA*, and (14.39) implies that forn > 
n1, one has 


W (stn f) — tn—1(f)) < £2"? JU A(An-1(f), dy) - (14.44) 


We conclude the proof with (14.37). oO 


Proof of Theorem 14.2.6 Let us recall the event §2(u) of Lemma 14.2.7. Our goal 
is to prove that when this event occurs, then 


sup | > ef (Xi)"| S Luya(F, dy,)(VNA* + 2(F, dyy)) | (14.45) 
feF j<n 


which by taking expectation and using (14.41) implies 


E sup | )> ei f(Xi)"| < L(VN A AF, dy) + 12(F, dy)” . 


JES j<N 


Using (11.34) and (11.35), this proves (14.33) and finishes the proof. 
To prove (14.45), since 0 € F, we may assume that zo(f) = 0. For each n with 
1 <n <n,, (14.38) means that one has 


| 5 ei (ta (f)(Xi)? = ni (f)(Xi)?)| < Lu2"? JN A*A(An-i(f), dyn) - 
i<N 
(14.46) 
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For n > nq, we write 


| >> ein f(Ki)? = na FUKI)| SYS | ta P(X? = 1 FX" - 
i<N i<N 
(14.47) 


Recalling the random norm W(f) of (14.42), and since a? — b* = (a — b)(a +b), 
using the Cauchy-Schwarz inequality, the right-hand side of (14.47) is at most 


W (atn(f) — tn—1(f)) W(atn(f) + tn-1(f)) 
< W(an(f) — mn-1(f)) (Wan (f)) + WOtn-1(f))), (14.48) 


where we have used the triangle inequality for W, and from (14.44) and (14.43), 
this is at most 


Lu2"!? A(An-1(f), din) (Y2(F, dy) + VNA*) . 


Combining with (14.46) and summation over n using (14.37) proves (14.45) and 
concludes the proof of Theorem 14.2.6. oO 


A statement similar to Theorem 14.2.6, but with considerably weaker hypothesis, 
was proved by S. Mendelson and G. Paouris [72]. We present here a key step of the 
proof of this result (the full proof is too technical to belong here), corresponding 
to (14.36): the control of the quantities )0;-) f(X ;)?. We follow the recent 
observation of W. Bednorz [15] that Fernique’s argument of Theorem 7.13.1 works 
well here. 


Theorem 14.2.9 ((72]) Consider a (countable) class of functions F with 0 € F. 
Consider two distances d, and dy on F. Assume that given f, f' € F, then? 


2 


u u )) 
anf. fa f' 72 
(14.49) 


Vu>0, wtf — fl = uh) < 2exp (— min ( 


Let S = yo(F, do) + 1 (F, d1). Then 


E sup ( xy) <L(S+VNA(F, d2) + VNAF,d1)). (14.50) 


EF ‘icy 


? In the paper [72], the assumption that one controls the diameter of F for d and d2 is relaxed 
into the much weaker condition that for a certain number g > 4 and a number C, we have V f € 
F,¥u>0, wtf] 2 ub) < (C/u)?. 
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The control on the size of F here is considerably weaker than the control of 
y2(F, w2) assumed in Theorem 14.2.6 since (14.49) holds for dj = 0 and dp the 
distance dy, associated with the norm || - || y,. 

Before we start the proof, we must understand the tail behavior of sums 
>» j>1 4 Yi; where a; are numbers and where the independent r.v.s Y; satisfy the tail 
condition (14.49). We proved much more general results in Sect. 8.2, but the reader 
wanting a direct proof should try the following exercise: 


Exercise 14.2.10 Consider a centered r.v. Y, and assume that for two numbers 0 < 
B < A, we have 


2 
Vu>0, P(|Y| = 4) < 2exp(—min(—;.=)). (14.51) 
Then 
0<1A <1/(2B) => EexpayY < exp(LA? A”) : (14.52) 


As in the proof of Bernstein’s inequality (4.44), we deduce the following: 


Lemma 14.2.11 Consider i.i.d. copies (Yi)i<x of a centered rv. Y which satisfies 
the condition (14.51). Then for numbers (a;)j<x and any u > 0, we have 


1 “uP u 
P( aj Y; > uw) < Lexp(-—min({=—,, __“__)). 
ps ; i L A? Yi <p B max;<x |aj| 
(14.53) 
A convenient way to use (14.53) is the following, which is now obvious: 


Lemma 14.2.12 Consider i.i.d. copies (Y;)j<x of a centered r.v. Y which satisfies 
the condition (14.51). If w > 0 and 


= 2: . 
v=LA |w dai a LBw max |a)| , (14.54) 
us 


P(| > oai¥i| =v) < Lexp(—w). (14.55) 


i<k 


then 


We will use this result for the rv.s ¥; = e;(f(X;) — f’(X;)) for f, f’ € F. 
According to (14.49) in (14.54), we may then take A = do(f, f’) and B = 
dq (f, ips 

We denote by By,,2 the unit ball of 02(N). We write A; := A(F, d)) and A? := 
A(F, dz). Consider an independent sequence of Bernoulli r.v.s. Theorem 14.2.9 is 
an immediate consequence of the following: 
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Theorem 14.2.13 Given a number u > 0, we have 


sup sup | ) ia f(X)| = L(Van(F, dy) + uni F, di) 


aeBy.r2 feF jen 


Sais N Aya JuVNA2) (14.56) 


with probability > 1 — Lexp(—Lu). 


Indeed, taking the supremum over a in the left-hand side of (14.56) shows that 
this left-hand side? is sup fe EO jen LX ;)*)!/?, and the conclusion follows by the 
general method described at the end of Sect. 2.4. 

Before we prove Theorem 14.2.13, we need a simple lemma. We think of N as 
fixed, and we denote by ey, and eyo the entropy numbers of By 2 for the @2 and 
foo distances (see Sect. 2.5). 


Lemma 14.2.14 We have 
Seng < LVN; D2" enc < LVN. (14.57) 


n>0 n>0 


Proof of Theorem 14.2.13 We can find an increasing sequence of partitions 
(An)n>o of F with card Ap = 1 and card Ay, < Ny+1 such that for each f € F, 


iQ" AAn(f), di) +2" A(An(f), 22) < LF, di) + 2(F, a2) - 
n>0 


(14.58) 


We can find an increasing sequence of partitions B, of By,2 such that card By = 1 
and card 6, < Ny+1 such that 


Be By => A(B, da) < en2 3 ACB, doo) ¥ €n,oo - (14.59) 


We will perform chaining on By,2 x F using the increasing sequence of partitions 
Cy consisting of the sets B x A for B € By and A € A,. The chaining is routine 
once we prove the following inequality: If f, f’ €¢ A € An anda,a’ € B € Bn, 
then with probability > 1 — L exp(—u2") 


| > ea; f (Xi) — Ss ea, f'(Xi)| 


i<N i<N 


< L(u2" ACA, d\) + Vu2"A(A, dp) + u2"ey, 00 Ay + Vuden2A2) 


3 So, this left-hand side actually does not depend on the values of the ¢;, but nonetheless these are 
required to be permitted to use (14.55). 
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To prove this, we start with the triangle inequality 


| So cio F(X) — YO eit f’(Xi)| 
i<N i<N 
<| > ior — of) F(XD| +] D5 ciel (F(X) — F(X). 


i<N i<N 


We use (14.55) with Y; = e; f (Xj), a = aj — a’, A = A», and B = A, to obtain 
that the event 


| So ei(a; — of) f(X)| = L(Vuden2A2 a u2"en.coA1) 


i<N 


occurs with probability < Lexp(—Lu2") by (14.55). We use again (14.55), with 
now Y; = e;(f (Xi) — f’(X;)) and a; = a’, to obtain that the event 


| rei (Xi) = f(X))| = L( Vu" ACA, da) + 02" ACA, dh)) 


i<N 


also has probability < L exp(—Lu?2") (using now that Di<n (@j)* <1). Oo 


Let us recall the formula 


(") < (S)" = exptklog(en/4)) (14.60) 


Proof of Lemma 14.2.14 We prove only the statement related to én,o0 since the 
other statement is much easier. Let us denote by (e;);<y the canonical basis of 
02(N). Consider the largest integer k, such that 27k < N. For 1 < k < ky, let us 
denote by D, the set of vectors of the type 2~* ier Nie; Where card] = 2** and 
nj € {-1, 0, 1}. It should be clear that a point of By,2 is within supremum distance 
2-* of a point of Dj +... + Dx. Consequently, 


log N(By_.2, doo, a) < Y= log card D;. (14.61) 
jsk 


A vector in D; is determined by the choice of the set J of cardinality 2?) (for which 
there are at most ( i) possibilities), and once this set is chosen, we have another 


32 possibilities for the choice of the n; fori ¢ J. Thus, card Dj < 32 : (2) ‘) so 
we have logcard Dj < 27/ log(3eN /27/) by (14.60). There is now plenty of room 
in the estimates, aihough the computations are messy. Fixing 6 = 1/4 and since 
logx < x® for x > 1, we then have logcard Dj < L27)0-B) NB, and (14.61) 
implies that for k < k,, we have log N(By 2, doo, 2~ ky < L2?kU-B) NP. Thus, 
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forn < 2k1, we have €noo < 2-* when L27kU-B) WP < 2" which occurs for 
a-k ~ NI 2U-B))9-n/20-B)_ Since 


S> 2tw/2G-B)2-"/20-B) < LIN 
n<2k, 


this proves that bse <2k; 2" en.oo < LVN. We leave to the reader the much easier 


proof that pares 2" en.oo < LVN, using the fact that the quantity en oo decreases 
very fast as n > 2k, increases. Oo 


14.3. When Not to Use Chaining 


In this section, we work in the space R” provided with the Euclidean norm || - ||. 
We denote by (-, -) the canonical duality of IR” with itself. We consider a sequence 
(X;)i<n of independent R”-valued random vectors, and we assume that 


\|x|| < 1 > Eexp|(x, X;)| < 2 (14.62) 
and 


max ||X;| < (Nn)!/4 (14.63) 
i<N 


Theorem 14.3.1 ([1, 2]) We have 


sup | >> ((x, Xi)? — E(x, Xi)*)| < LVaN , (14.64) 


Ixl<l_ joy 


with probability > 1 — L exp(—(Nn)!/*) — L exp(—n). 


A particularly striking application of this theorem is to the case where the X; are 
iid. with law jw where wu is isotropic (see (14.34)) and log-concave* It is known 
in that case (see [1] for the details) that for each x € R”, we have ||(x, Xi) lly, < 
L||x|| and the hypothesis (14.62) appears now completely natural. We may then 
write (14.64) as 


1 
sup | 0G, Xi)? — IP] a 
i<N 


Ilx||s1 


4 For example, when y is the uniform measure on a convex set, it is log-concave, as follows from 
the Brunn-Minkowski inequality. 
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Many of the ideas of the proof go back to a seminal paper of J. Bourgain [23].° In 
the present case, rather than chaining, it is simpler to use the following elementary 
fact: 


Lemma 14.3.2 In R*, there is a set U with cardU < 5* consisting of vectors of 
norm < 1, with the property that x € 2convU = conv2U whenever ||x|| < 1. 
Consequently, 


1 


1/2 
vxeR, aacu, diaix; = 5(03?) (14.65) 


i<k i<k 


Proof It follows from (2.47) that there exists a subset U of the unit ball of R* with 
card U < 5* such that every point of this ball is within distance < 1/2 of a point of 
U. Given a point x of the unit ball, we can inductively pick points wg in U such that 
lx — oiecen 2°! up|] < 2~", and this proves that x € 2 conv U. Given x € R* and 
using that x/||x|| € 2convU, we obtain that ||x||7 = (x,x) = ||x||(x,x/||xll) < 
2\|x|| sup,ey (x, a) which proves (14.65). oO 


For k < 2N, we use the notation 
p(k) = klog(eN/k) , 


so that (14.60) becomes 
N eN\k 
( ) a (=) = expy(k). (14.66) 


Thus, g(k) > k, the sequence (y(k)) increases, and g(k) > g(1) = 1+ log N. 
The difficult part of the proof of Theorem 14.3.1 is the control of the random 
quantities 


Ag:= sup sup (Kener). (14.67) 


|x| <1 card Isk Spey 


It will be achieved through the following: 
Proposition 14.3.3 For u > 0, with probability > 1 — L exp(—u), we have 


Vk>1, Ap< L(u + o(k)/Vk + max IXill) (14.68) 


5It would be nice if one could deduce Theorem 14.3.1 from a general principle such as 
Theorem 14.2.9, but unfortunately we do not know how to do this, even when the sequence (X;) 
is iid. 
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Corollary 14.3.4 If N <n, then with probability > 1—L exp(— (nN)'/4), we have 


sup bee x <LVNn. 


|x [<1 i<N 


Proof Since g(N) = N, we may use (14.68) fork = N andu = (nN)!/4 
and (14.63) to obtain Ay < L(Nn)!/4 + LYN < L(Nn)!/4. Oo 


We start the proof of Proposition 14.3.3 with the identity 


Ax = sup sup sup Yo ait (x, Xj) sup su | ee 
|x| <1 card Ik a?<1 jez card isk, , aa 


(14.69) 


The proof will require several steps to progressively gain control. 


Lemma 14.3.5 Consider x € R” with ||x|| < 1 and an integer 1 < k < N. Then 
for u > 0, with probability > 1 — Lexp(—u — 3@(k)), the following occurs: For 


each setI C {1,..., N} with card I = m > k, we have 
YS |(x, Xi)| S$ 6g(m) + u. (14.70) 
iel 


Proof Given a set I with card = m, (14.62) implies Eexp )> 
expm < expg(m), and thus by Markov’s inequality, 


jer (X, Xi)| < 2" < 


P( Sole, Xi)| = 6pm) +1) < exp(—Se(m)) exp(—u) . 


iel 
Since there are at most exp g(m) choices for J by (14.66), the union bound implies 
YS PCS Me Xi)! = 690m) +0) <= D> exp(—4y(m)) exp(—u) - 
card [>k ie] m>k 


Now we observe that g(m) > p(k) form > k and that y(m) > m, so that 


Y= exp(—4y(m)) < exp(—39(h)) Y > exp(—y(m)) < Lexp(—3¢(k)). a 


m>k m>1 


Using Lemma 14.3.2, for each 1 < k < N, and each subset J of {1,..., NV} of 
cardinality k, we construct a subset S;,7 of the unit ball of R/ with card Sk < 5K 
such that conv 2S; ; contains this unit ball. Consequently, 


1 1/2 
xéeR! > sup \aix; > 5(>0?) (14.71) 


aeSkt je] ie] 
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Lemma 14.3.6 With probability > 1 — L exp(—u), the following occurs. Consider 
disjoint subsets I, J of {1,..., N} with cardI = m > card J = k, and consider 
anya € Sx.j. Then 


Y> (Xi, )) ajXj)| < Go(m) + uw) |] Ya; X;| - (14.72) 


iel jeJ jeJd 


Proof First, we prove that given J and a € Sx 7, the probability that (14.72) 
occurs for each choice of J of cardinality m and disjoint of J is at least 1 — 
L(k/eN)** exp(—u). To prove this, we show that this is the case given the r.v.s 
X ; for j € J by using Lemma 14.3.5 for x = y/|ly|], y = et a; Xj. Next, there 
are at most exp y(k) choices of J of cardinality k, and for each such J, there are at 
most 5 choices for a. Moreover, since g(k) > k, 


D- exp(—39(k))5* exp yk) = D | exp(—29(k))S* < DPE 5" < L. 


k<N k<N k>1 


The result then follows from the union bound. oO 


Corollary 14.3.7 For u > 0, with probability > 1 — Lexp(—u), the following 
occurs. Consider disjoint subsets I, J of {1,..., N} withcard ] =m > card J =k, 
and consider any sequence (bj)icy with Yo je 7 b? < 1. Then 


Y> (Xi, 5 Bj Xj)| < L@m) + uA. (14.73) 


iel Jed 


Proof With probability > 1 — Lexp(—u), (14.72) occurs for every choice of a € 
Sx, We prove that then (14.73) holds. Since des a; < 1 fora € Sx,7, for each 


such sequence, we then have || ies aj;Xj\| < Ax, and then (14.72) implies 


(Xi, y \ajX))| < (69(m) + u)Ax . 


ie] jet 
Now, each sequence (bj) jey with Lee, ae 1 is in the convex hull of 2S;.7, and 
this proves (14.73). oO 
We will need the following elementary property of the function g: 


Lemma 14.3.8 For 1 <x < y < N, we have 


g(x) < Lx/yP“oQ) , (14.74) 


Proof The function x tb x4 log(eN/x) increases for 1 < x < Ne~?, Ne) 
that (14.74) holds with L = 1 for y < Ne~3, and g(y) is about p(N) for 
NeFvr<y<N. g 
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We are now ready for the main step of the proof of Proposition 14.3.3. 


Proposition 14.3.9 When the event of Corollary 14.3.7 occurs, for any two disjoint 
sets I, J of cardinality < k and any sequence (aj)j<Nn with ey on < 1, we have 


(do ai Xi, ¥° aj Xj)| < Lut g&)/VBAz- (14.75) 


iel jes 


Proof The idea is to suitably group the terms depending on the values of the 
coefficients (aj). Let k = card/, and let us enumerate J = {i,...,i,} in such 
a way that the sequence (|q;,|)i<s<« is non-increasing. Let us define a partition 
(1})o<e<e, of I as follows. Consider the largest integer €; with 2°| <2cardI = 2k. 
For 0 < @ < &, let Ig = {i,..., ine}, and let Zz, = J. We set Ih = Io, and for 
1 << t, we set J; = Ip \ Ip_1, so that the sets J; for 0 < € < ¢; forma partition 
of I. For0 < é < £4, we set yg = dies, a; X;, so that 


So aiXi = > ye. 


ie] 0<e<0, 


Let us then define similarly for 0 < € < £2 sets Je C J with card Je = 2° for 
€ < €2, sets J; and elements ze = Dies a; Xj; so that piel ajXj= Dio<e<e, Ze. 
Without loss of generality, we assume card J > card J, so that £; > €2. We write 


(S\aiXi, )ajXj)=( D> ye. DO ee) =14 0, (14.76) 


ie] jed O<l<t, O<t’ <2 


where 


I= > (ye, >» ze); W= ba ( > Yes Ze!) « 


0<l<0) O<e/<min(€, £2) O<0/<blo O0<b<l' £<1 


This identity is obvious if we observe that I is the sum of the quantities (ye, zg’) over 
the set {0 < £€< €,,0< l’ < by, t' < 2}, whereas II is the sum of these quantities 
over the set {0 < £€< 2,;,0< 2’ < £2, 0’ > @}. 

We bound I. Since the sequence (|qa;|) is non-increasing, we have slaj, |? < 
><, 1ai|? < 1 so that |a;,| < 1/./s and in particular |a;| < 2~"/2*! fori € qh. 
Next, for each vector x and each 0 < @ < £1, we have 


lees | aXe 2)|= >. lallitya)l eo" ix .x) | 
iel) iell iel) 


(14.77) 
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Using this inequality for x = ) o<p<e Ze’ = Do jey, 4j)Xj and since I; C Ip, we get 


Itves DD ze} S277 DY (Xi, Do a X;)| 


O<t/<e iel, aoe 
sO DMX DL aX) - 
iely JEde 


Thus, we may use (14.73) for 7 = Jz and J = Je, and 
card [2 = min(2°, card J) > min(2°, card J) = card Je 


to obtain, since Acaray < Ak, 


DoM(xi, DE ze) 


iele O<t’<é 


< L(u + g(card I¢)) Ax 


< Lu + (card 1) k-9(k)) Ak 
using (14.74) in the second inequality. Thus, we have shown that 


Ts D0 12? (ut (ard Ip)*/4k-F/4(k)) Ag 
0<l<0 


Since card Iy < 2°, we have Do<e<e, 2-*/2 (card I,)3/4 < Lk'/* so that I < Lut 


v(k)/k) Ax. The same argument proves that this bound also holds for II (using 
now that card Iy_, < 2°—! < card Jy if €’ < £2). Oo 


Proposition 14.3.10 When the event of Corollary 14.3.7 occurs, we have 


Vk>1, A? < max Xi? + Lu + o(k)/VR)Ag . (14.78) 
i< 
Proof We fix once and for all an integer k. Consider a subset W of {1,..., N} with 
card W = k. Consider (a;)jew with )°;-w a? < 1. Then 
2 
| cai =o aii + Yo (a Xi,aj Xj). (14.79) 
icW icW i,jeW ij 


We use the obvious bound for the first term: 


> a? || Xi? < max ||X; | . (14.80) 
i<N 
ieW 


For the second term, we use a standard “decoupling device”. Consider independent 
Bernoulli r.v.s ¢;, and observe that fori # j, we have E(1 — ¢;)(1 + €;) = 1, so 
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that by linearity of expectation, and denoting by E, expectation in the r.v.s ¢; only, 


> (aiX;,ajXj)=Es YD. (+e) —€;)(aiXi,ajX;) - 
i,jew iA] i,jeWiFj 


Given (¢;), observe that if 7 = {i e W; e¢; = l}andJ = W\TJ, 


1 
zg dy Ute ej) aiXi,ajXj)= DY) (aiXi,ajX;) 


i,jEWiFj iel,jeJ 
= (> aX, Ss ajXj). (14.81) 
iel jeJd 
The bound (14.75) completes the proof. Oo 


Proof of Proposition 14.3.3 We use that BA, < (B* + A?) /2 with B = L(u+ 
v(k)/Vk) to deduce from (14.78) that A? < L(maxj<y ||Xi||? + (u+ o(k)/Vk)?). 
This implies (14.68). oO 


Proof of Theorem 14.3.1. According to Corollary 14.3.4, we may assume N > n. 
According to Lemma 14.3.2, there exist a subset U of R” with cardU < 5”, 
consisting of elements of norm < 2 and such that its convex hull contains the unit 
ball of R”. Thus (considering that one may take y = x to obtain the first inequality), 


sup | >> ((x, Xi)? — E(x, X;)”)| 


xsl joy 


< sup ((x, Xi) (y, Xi) — E(x, Xi)(y, Xi) 
IxIl.Ulyist fay 

< sup | > ((x, Xi)(y, Xi) — E(x, Xi)(y, Xi))| - (14.82) 
x,yeU i<N 


The plan is to assume that 
Vk>1, Ag < L(Vklog(eN/k) + (Nn)!*) , (14.83) 


and to prove that then with probability > 1 — exp(—n), the right-hand side 
of (14.82) is < LW/Nn. This completes the proof of Theorem 14.3.1 because 
Proposition 14.3.3 and (14.63) show that (14.83) occurs with probability > 1 — 
Lexp(—(Nn)!/4), 


Consider a truncation level B > 0 which we will determine later (depending only 
on N), and define 


Zi(x, y) = (x, Xi) (y, Xi) Uqr,x;) (yx) |<B) 
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and 
Yi (x, y) = (x, Xi) (y, Xi) Ue, x;) (y,X)|>B} > 


so that (x, Xi){y, Xi) = Zj(x, y) + Yi(x, y) and ¥j(x, y) FO => |¥i(x, y)| = B. 
This argument is yet another instance of a decomposition in a “spread out part” 
and a “peaky part”. The peaky part )°; Y;(x, y) will be controlled as usual without 
using cancellations, i.e., we will control }0; |Yi(x, y)|.© We bound the right-hand 
side of (14.82) by I+ II + III, where 


I= sup | > \(Zi(x, y) -— EZi(x, y))], (14.84) 
x,yeU i<N 
= sup 1%, yl, (14.85) 
x, yeu jen 
I= sup \ (EG. 3). (14.86) 
x yeu jen 


The fun is to bound II. We prove that when (14.83) occur, then II < L./Nn. For 
this, let us fix x, y € U and set 


1={i<N; |¥i@,y|>BJ=E <N; Vi, y)| 40}. 


Defining m = card I we have, using the Cauchy-Schwarz inequality in the second 
inequality, and recalling the definition (14.67) of Am, 


mB <S Ihe ys (ex?) (Lo. xv?) s4a2,, 04.87 
iel 


icl icI 
and thus from (14.83), 
mB < L(m(log(eN/m))* + VNn) . (14.88) 
Without loss of generality, we assume from Corollary 14.3.4 that N > n and 


then N > Nn. Thus, we may consider the smallest integer kg < N such that 
ko(log(eN/ko))* > Nn. 


6 And as usual this control is far harder than the control of the cancellations. 
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Let us now choose B = 2L1 (log(eN/ko))?. Assuming if possible that m > ko, 
then (14.88) implies 


2Ljm(log(eN/ko))* = mB < Li(m(log(eN/m))* + VNn) 
< Li(m(log(eN/ko))” + VNn) , 


so that m(log(eN/ko))? < /Nn and thus 
ko(log(eN/ko))” < m(log(eN/ko))” < VNn . 


This is impossible by the definition of kp, so that we have proved that m < ko. By 
definition of kg, we then have m(log(eN /m))” < VNn, ie., /mlog(eN/m) < 
(Nn)!/4, Thus, by (14.83), we have Am < L(Nn)!/4, and finally by (14.87) and 
since Dy; <y [¥iQx, y)| = Dye, |¥ix, y)| that HI < LVNn. 

Next, let us control III. Recalling (14.63), we have II < )7,—y ||Xill? < NWNn. 
Moreover, we have just shown that when (14.83) occurs, i.e., with probability > 
1 — Lexp(—(Nn)!/4), we have in fact Il < L./Nn. Thus, Il < Ell < L/Nn + 
Lexp(—(Nn)!/4)N/Nn < LVNn. 


It remains to bound I. Since (log x)? < L./x for x > e, 
VNn < ko(log(eN/ko))” < Lko'N/ko ; 
and thus kp > n/L. Therefore, with huge room to spare, 
B = 2L\(log(eN/ko))? < LVN/n. 


Since |Z; (x, y)| < B and EZ;(x, y)* < L (using the Cauchy-Schwarz inequality 
and (14.62)), it follows from Bernstein’s inequality (4.44) that 


pe t 
P(| Se y) — EZj(x, y))| = t) BS 2exp ( - a (=. am) , 


i<N 


The right-hand side is < 5~*” for t = L./Nn. There are at most 57” choices for 
the pair (x, y) € U*, so that by the union bound I < LWNn with probability 
1—5~" > 1 —exp(—n). This completes the proof of Theorem 14.3.1. Oo 


Chapter 15 ® 
Gaussian Chaos Ghost for 


Gaussian chaos are simply polynomials in Gaussian r.v.s. In this chapter, we 
investigate two questions related to chaos. Since we understand the boundedness 
of Gaussian processes well, we might hope that this could be the case of “chaos 
processes’. Unfortunately, even in the simplest case of order 2 chaos, this is far 
from being the case, as we will explain in Sect. 15.1. It is striking that there exist 
apparently rather different methods to bound a chaos process, and it remains unclear 
how (if at all possible) we may describe the supremum of a chaos process in terms 
of geometric characteristics of the index set. Section 15.2 investigates a different 
topic, the size of the tails of a single chaos, a deep result of R. Latata. 


15.1. Order 2 Gaussian Chaos 


15.1.1 Basic Facts 


Consider independent standard Gaussian sequences (g;), (g'), i,j > 1. Givena 
double sequence t = (f;,;);,;>1, we consider the r.v. , 


Xr= > tj8i8%, - (15.1) 


ij2i 


The series converges in L? as soon as ye isi 7 j <O but for the present purpose 
of proving inequalities, we may as well assume that only finitely many coefficients 
t;,; are not 0. This random variable is called a (decoupled) order 2 Gaussian chaos. 
There is also a theory of non-decoupled chaos, a ti,;8igj. For the present 
purposes of finding upper bounds, this theory reduces to the decoupled case using 
well-understood arguments such as the following: 
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Lemma 15.1.1 We have 


E sup | Sti, 78i8) + > tale? = | <2E sup | > tj 8i8'| . (15.2) 
T 


tel inj i>l ae 


Proof It is obvious by Jensen’s inequality (taking the expectation in the r.v.s g; 
inside rather than outside the supremum and the absolute values) that 


’ 


Esup| > 4jgig; + )-ti.i(g? — | < E sup dot i(gi + (gi — 8) 


te ixj i> te i,j 


and the right-hand side is just the right-hand side of (15.2) because the families 
(gi + g:)/ J/2 and (gi — 8) /V2 are independent sequences of standard Gaussian 
r.v.s independent of each other. Oo 


Given a finite family T of double sequences t = (t;,;), we would like to find 
upper and lower bounds for the quantity 


S(T) = Esup X; . (15.3) 
teT 


We would like in fact to understand the value of S(T) as a function of “the geometry 
of T” as we did in the case of Gaussian processes. Surprisingly, the difficulty of this 
problem is of an entirely different magnitude from the Gaussian case, and we have 
only limited results to offer at this point. 

Let us start with the basics. We denote by B the unit ball of ¢?(N*), B = 
{a = (@;)j>1; yet as < 1}, and we note the following fundamental fact: For 
real numbers (x;) j>1, we have 


1/2 
(023) = sup )\ajx;. (15.4) 
jl neal 
Given an array t = (t;,;), we define 


II¢|| = sup ( Ss ( 3 at.j)) 1/2 


izl j2l 


— sup | » aj Biti,; > a3 <1, YAS 7 : 


ij2i j21 iz 
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If we think of ¢ as a matrix, ||r|| is the operator norm of t from €? to £7. We will also 
need the Hilbert-Schmidt norm of this matrix, given by 


1/2 
Wlas=(Yo 2) 


i,j>1 


Thus, ||¢|| < ||¢|| as by the Cauchy-Schwarz inequality. 
We find it convenient to assume that the underlying probability space is a product 
(2 x 2’, P=Po @P’), so that 


X(0, 0) = Dot jgi(@)g(@’) . 
iy 


Conditionally on w, X; is a Gaussian r.v. Denoting by E’ integration in w’ only (i.e., 
conditional expectation given w), we have 


Ex? =)~ ( Yo tigi ()) ; (15.5) 


jel i=l 
Consider the r.v. 
or = 0; (w) = (E’X?)'?,, 


and note that Eo? = EX?. The importance of this r-v. is made clear by the fact that 
the random distance d,, associated with the Gaussian process X; (at given w) is 


dys, t) = Fs1(@) . (15.6) 


Thus, d.,(s,t)? = jet Qi Gig — 4/8 (w))?. A fundamental difference with 
the situation of Chap. 11 is that there is no reason why the probability that d,,(s, ft) 
is small should be small. A particularly striking case of this is when there is only 
one non-zero term in the sum a (The condition (11.8) plays a fundamental role 
in Chap. 11.) 


Lemma 15.1.2 We have 
v2 
P(lor — Illas| = + Littl) < 2exp(— 37). (15.7) 


Proof Given a € B, we consider the r.v. 


gat =D 8i()( Yrajtis) = Yoaj( Donji) , (15.8) 


i=l jzi jz1 i=l 
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so that from (15.4) 


SUP 2u,1 = Or, (15.9) 


aceB 
and also 


22)" =(S2(Doayns)) <iel. 


izl j2l 
Following (15.9), (2.118) implies that for v > 0, 


2 


Uv 
oe) (15.10) 


P(\o; — Ea;| > v) < 2exp(— 


In particular, we have ||o, — Eo; ||2 < L||t|| where || - ||, denotes the norm in L7(2). 
Using the general inequality ||| X||—||Y'|I| < ||X —Y'|] yields |||o;||2—|Eo;|| < Lllel, 
and since Eo; > 0, we obtain |||o;||2 — Eo;| < L||t||. Now 


llorll2 = (E07)? = (Ex?)!? = Its, (15,10) 


so that |Eo; — ||t|las| < L||t|| and (15.10) implies (15.7). oO 
We are now ready to prove a simple classical fact (first obtained in [38]). 


Lemma 15.1.3 For v > 0, we have 


1 v? Uv 
P(|\X;| > ») < Lexp( — Pmin n(—— ~)). (15.12) 
L tars (Well 
Proof Given w, the r.v. X; is Gaussian so that 
/ v 
P’(|X;,| > v) < 2ex (-—). 
(| 1 = ) = Pp 207 
and, givena > 0, 
a2 
P(X:| = v) = EP'(\X:l = v) < 2Eexp(- 5) 
20; 


vy 
< 2exp(- =a) + 2PC: >a). 
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We now estimate from above the last term of the previous inequality. It follows 
from (15.7) that P(o; > v + |I¢lvs + Lilt|l) < Lexp(—v?/2\lt||?). Since ||t|| < 


\It zs, we have in particular that P(o, > v + Lollt|lzs) < Lexp(—v7/2\||r||*), and 
thus, when a > 2Lo|lt|lzs, 


P(o, >a) < P(oi = 5+ Lolitas) < Lexp(- aa) . 
Consequently, when a > 2Lo||t|lys, 
v a 
P(Xil 2 v) < 2exp(— 55) + Lexp(- Za) - (15.13) 


To finish the proof, we take a = max (Liltllas, Jvitl). The last term in (15.13) 
is always at most Lexp(—v/(L|lt||)), and the first term is always at most 
L exp(—v/(Llt||)) + Lexp(—v*/LI¢|Ins) - Oo 


Consider the two distances on T defined by 
Aoo(S, t) = ||t — s]|, d2(s,t) = |lt—sllus . (15.14) 


As a consequence of (15.12), we have 


2 


1 v v 
P(\X;—X;|>v) <L — — min (——— , —_—~ 15.1 
(| i|>v)< exp ( z min (a Tats, 5)) (15.15) 


and Theorem 4.5.13 implies the following: 


Theorem 15.1.4 For a set T of sequences (t;,;), we have 


S(T) = Esup X; < L(1(T, doo) + y2(T, a) . (15.16) 
teT 


We analyze now a very interesting example of set T, which will show in 
particular that (15.16) cannot be reversed. Given an integer n, we consider 


T={t; |tl<sl,4j; 405i, j <n}. (15.17) 


Since 


1/2 1/2 
Y>tijgigi < (doe) (78?) ell. 
ij 


i<n jen 
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taking supremum over t ¢€ T and expectation (and using the Cauchy- 
Schwarz inequality) implies that S(T) < n. Volume arguments show that 
log N(T, doo, 1/4) = n?/L, so that y1(T, doo) = n?/L. It is also simple to prove 
that (see [53]) 


log N(T, dx, /n/L) > n?/L 


and that S(T) is about n, y,(T, doo) is about n?, and y2(T, dz) is about n?/?, Tn 
this case, (15.16) is not sharp, which means that there is no hope of reversing this 
inequality in general. This is so despite the fact that we have used a competent 
chaining method and that the bounds (15.15) are essentially optimal (as follows, 
e.g., from the left-hand side of (15.69)). It can also be shown that in the case where 
the elements t of T satisfy t;,; = 0 fori 4 j, the bound (15.16) can be reversed. 
This is essentially proved in Theorem 8.3.3. 


15.1.2. When T Is Small for the Distance doo 


We continue the study of general chaos processes. When T is “small for the distance 
doo”, it follows from (15.15) that the process (X;);<e7 resembles a Gaussian process, 
so that there should be a close relationship between S(T) = Esup;e7 X; and 
y2(T, dz). The next result is a step in this direction. It should be compared with 
Theorem 6.6.1. 


Theorem 15.1.5 We have 


y2(T, dr) < L(S(T) + VST) MCT, doo) « (15.18) 


The example (15.17) provides a situation where this inequality is sharp, since 
then both the left-hand and the right-hand sides are of order n3/?. Combining with 
Theorem 15.1.4, this implies the following: 


Corollary 15.1.6 Defining 


yo(T, dr) ” 


we have 


1 
L427 @) S S(T) SLO + R(T, da) . (15.19) 


In particular, S(T) is of order y2(T, d2) when R is of order | or smaller. 
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Proof The right-hand side is obvious from (15.16). To obtain the left-hand side, we 
simply write in (15.18) that, since /ab < (a+ b)/2, 


VS(T)Y(T, doo) = V S(T) Ry2(T, d2) 


2a T,d LS(T)R 
< 5 (F707, dz) + LS(T)R) 


where L is as in (15.18), and together with (15.18), this yields 
1 
y2(T, dz) < LS(T) + gH2tT, dy) + LS(T)R. o 


In the examples of interest, y;(T, doo) has a tendency to be large, and the 
previous results are not sharp. Nonetheless, we will prove Theorem 15.1.5 as an 
exercise in using functionals in a slightly new way, and the reader who is not 
interested in this aspect is invited to jump to Proposition 15.1.13. We recall the 
random distance d,, of (15.6). 


Lemma 15.1.7 We have 


P(do(s, es sats 1) < Lexp(- a) (15.20) 
Proof Taking v = ||t|| 75/4 in (15.7), we obtain 
IIt lls Itlli7s 
P(lor — Iills| = = + Lill) < 2exp(— 4S). 
When Lj |It|| < ||t lla s/4, this gives 
IIt Iles It lls 
P(oi < as) < Lexp(- ae (15.21) 


whereas when Lj||f|| > ||tllas/4, (15.21) holds automatically if the constant in 
front of the exponential is large enough. Oo 


We will deduce Theorem 15.1.5 from the following general abstract result: 
Theorem 15.1.8 Consider a finite set T, provided with two distances d and d,. 


Consider a random distance dy on T and anumber 0 < a < 1/2. Assume that 


Ws,t ET, P(da(s,t) > ad(s,t)) >a (15.22) 


d*(s, ») 


Vs,t ET, P(dw(s,t) < ad(s,t)) < ees ( a 
a d;(s,t) 


(15.23) 


464 15 Gaussian Chaos 


Consider a number M such that 


P(¥2(T, dy) < M) >1-a/2. (15.24) 
Then 
y2(T,d) < K(a)(M+/My(T, d)) , (15.25) 


where K (a) depends on a only. 


Proof of Theorem 15.1.5 We first prove that the pair of distances dj = doo 
and d = dp» of (15.14) and the random distance d,(s,t) = o;—;(w) of (15.6) 
satisfy (15.22) and (15.23) whenever a is small enough. For (15.23), this is a 
consequence of (15.20). Next, the formula (15.9) makes o;, and hence oy_,, 
appear as the supremum of a Gaussian process. Applying the Paley-Zygmund 
inequality (6.15) to this process yields P(os;_, > (Eo2_,)!/?/L) > 1/L. Since 
Eo? , = lls — tll3,5 = da(s, £1), (15.22) holds whenever a is small enough. 

Next, we prove that (15.24) holds for M = LS(T)/a. Since EE’ sup,-7 Xr = 
S(T), and since E’ sup,<-7 X; > 0, Markov inequality implies 


P(E’ sup X; < 25(T)/a) St=@ii: 
teT 


Since LE’ sup;er Xt = Y2(T, dy) by Theorem 2.10.1, this proves that (15.24) holds 
for M = LS(T)/a. Thus, (15.25) holds by Theorem 15.1.8, and it implies (15.18). 
Oo 


It would be nice to have a proof of Theorem 15.1.8 which falls into our general 
scheme of proof. We do not know how to do that. The next exercise shows how to 
obtain a weaker result in the same direction following this scheme of proof. 


Exercise 15.1.9 


(a) Consider an admissible sequence (D,,) of partitions of T and a probability 
measure jz on T’. For each t € T, we define no,4(t) = A(T, d,), and forn > 1, 
we define 7n,0(t) = A(T, dw) if w(Dn(t)) < Pee and otherwise, we define 


Mno(t) = inffe > 0; w(Dasi(t)M Ba, (t.€)) = 2Nz 5} (15.26) 


Prove that 


/ 2"? mn,o(t)du(t) < Ly2(T, do) . (15.27) 
I 


n>0 
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(b) Set €,(t) = inf{e > 0; w(B(t, €)) = Nob} < 0,(t). Prove that 


En(t) < K2"? A(Dn(t), dt) + KEnn w(t) - (15.28) 


and conclude. Hint: Review Proposition 3.3.1 for (a) and Theorem 5.4.1 for (b). 


As a warm-up for the proof of Theorem 15.1.8, we recommend that the reader 
masters Exercise 3.4.2. Our proof of Theorem 15.1.8 will use the following 
functional, related to that exercise: 


Definition 15.1.10 Consider a number M as in (15.24). For any set H C T and any 
probability measure yz on T with (7) = 1, we define 


F (uw, H) = Ely int {D2 min (A(An(t), do), A(An(t), d))du(t) , 


(15.29) 


where U is the set {y2(T, dw) < M} and where the infimum is computed over all 
sequences of admissible partitions of H. For any set H C T, we then define 


F(A) =sup{F(u, 1); w(H)=1}. 
Lemma 15.1.11 We have 
A(T,d)< KF(T). (15.30) 


Proof Considering just the term for n = 0 in (15.29), and since Ag = {T}, we get 
that for any measure jL, 


F(u, T) > Ely min(A(T, dy), A(T, d)) . 


Using (15.22) for s,t with d(s,t) => A(T, d)/2, we obtain that P(A(T,d,) = 
aA(T,d)/2) => a. Since P(UU) => 1 — a/2 by (15.24), it follows that P(U N 
{A(T, dw) = a A(T, d)/2}) > a/2 so that F(u, T) > ACT, d)/K by (2.7). oO 


The following lemma provides the appropriate growth condition for the func- 
tional F: 


Lemma 15.1.12 Assume the conditions of Theorem 15.1.8. There exists a constant 
Ko = Ko(a@) with the following property. Consider an integer m > 2. Consider 
a set D C T with A(D,d\) < 2a/(KoV/logm), and for € < m, consider points 
te € D that satisfy d(tz, te) => a for € # £’. Consider moreover for £ < m sets 
Ay Cc Bite, a/Ko). Then 


F( U He) > = Viogm + min F (He) . (15.31) 


L<m 
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Proof of Theorem 15.1.8 We repeat the proof of Theorem 6.6.1, using the func- 
tional F'(#Z) rather than the functional b(#) and the distance Kod, rather than the 
distance doo. The only two properties of the functional b(H) which were used in 
the proof of Theorem 6.6.1 are those proved in Lemmas 15.1.11 and 15.1.12. Thus, 
we obtain that y2(T,d) < K(a)(F(T) + /F(1T)"1(f, dj)). But it is obvious that 
F(T) < Eluy2(T, dw) < M. Oo 


Proof of Lemma 15.1.12 Let us fix for each € < m a probability measure je with 
we(He) = 1, and let p = m! ecm He. Fort € H := Upe,, He, let us define 
L(t) < mbyt € Heq). Let us assume first that m > 32. Let us then consider the 
largest integer no with Nn. < ./m/32, so that 2”°/* > ,/logm/K. Considering a 
given w and a given admissible sequence (.A,,) of partitions of H, fort € H, let us 
define 


f(t, @, (An)) = min (A(Ang (1), do), A(Ang (1), @)) 


= min (A(Ang(€) 9 Hes do), A(Ang(®) 9 Hea, d)) - (15.32) 
Thus, given the admissible sequence (.A,,), we have 


¥ 2"? min (A(An(1), do), A(An(1), €)) > 2"? F(t, (An)) 


n>0 


+ > 2"/2 min (A(An (t) ON Het), dw), A(An(t) N Hea, d)) : (15.33) 


n>0 


Now, for each £, the sets AM He for A € A, forma partition Ay ¢ of He, and the 
sequence (A, ¢)n>0 is an admissible sequence of partitions of He, so that 


i Yo 2"? min (A(An (1) Heir), do), A(An(t) 0 Hey, 2)) duet) 


n>0 


mee / yo min (A(An(t)M He, de), A(An(t) M He, d))due(t) 
m 


L<m n>0 


1 
> my Saige [ 2"? min (Abn). do) AUBa (ad) (15.34) 


n>0 


where the last infimum is taken over all admissible sequences of partitions (B, (t)) 
of He. Integrating (15.33) with respect to du, combining with (15.34), taking the 
infimum over the choice of the sequence (A,,), multiplying by ly, and taking 
expectation, we obtain 


1 
F(q, H) = 2° Ely int / F(t,@, (An))du + — D7) Fe, Ho). (15.35) 


L<m 
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The goal now is to bound from below the first term on the right-hand side of (15.35). 
Consider the set 


B=((s,t.)e¢HxH; d(s,t) = a/2}=H x H\(\) Hex He). 


L<m 


Since the sets Hy x He are disjoint and satisfy uw ® uw(He x He) = 1/m?, we 
have w ® u(B) = | — 1/m. Given (s, t) € B, it follows from (15.23) (and since 


di(s,t) < 2a/(Ky./logm)) that 


Ki =) 


1 
P(do(s,1) = aa/2) < —exp ( -=2 


If Ko is large enough, the right-hand side is < 1/4m, so that then 


1 
En @ w({(s,t) € B: dy(s,t) > 2) <p. 
WW 


It then follows from Markov’s inequality that the event Up defined by 


cued eHe “hee. 
2: m 


has probability > 3/4. Since 
aa aa 
(G1) € Hx; do(s,.t) =} C160) € B; dols,t) < —}UCT x H\B), 


and since u @ «(HH x H \ B) < 1/m, when Uo occurs, we have 


MOuUL(S,t1)€ Hx H; d(s,t) < a = = . (15.36) 


Assuming that (15.36) holds and that Ko is large enough, we prove that 
inf / F(t, @, (An) det) = = (15.37) 
in ,Q, >s. : 
ae n) JOU K 


Combining with (15.35), using that P(U M Uo) = 1/2, and taking the supremum 
over the choice of the measures, jg completes the proof of (15.31). 
We start the proof of (15.37). Let 


By = (JIA € Ang; A(A, do) < wa /2} 5 By = (J{A € Ang; ACA, d) < a/2} 
(15.38) 


468 15 Gaussian Chaos 


If A € A,, satisfies A C B, then A(A,d,) < aa/2 and Ax A C 
{(s,t) ; dw(s,t) < aa/2}. Thus, by (15.36), it holds that uw @ w(A x A) < 2/m, 
so that (A) < ./2/m. Since B, is the union of at most Nn. < /m/32 such sets, 
we have (Bi) < NnayJ2/m < 1/4. Next, if A € An, satisfies A C Bo, then 
A(A,d) < a/2. Since d(tz, te) > a for £ ¢ £’, A is entirely contained in a set 
He, so that w(A) < 1/m. Since B2 is the union of at most N,, such sets, we have 
11(B2) < Nng/m < 1/4. 

Denoting by C the complement of B, U Bz, we have shown that u(C) > 1/2. 
Now, by definition of B, and Bo, if t € C, then 


A(Ang(t), dw) = aa/2; A(Ang(t), d) = a/2 
so that since A( He, d) < 2a/Ko, we obtain for such ¢ that 
f(t, @, (An)) = a(min(@/2, 1/2) — 2/Ko) = a/K , (15.39) 


if Kg has been taken large enough. Thus, f(t, w, (A,)) => a/K ona set of measure 
> 1/2, and this completes the proof of (15.37) and concludes the argument when 
m > 32. 

Let us now consider the case where m < 32. Then we set no = 0, and we proceed 
in a similar but much simpler manner. We define f (t, w, (A,)) as in (15.32) so that 


f(t, @, (An)) 2 min(A(H, dy), ACH, d)) — A(Heq, d) - 


and it suffices to use (15.23) to prove that with high probability, we have 
A(H, d,) = a/K to conclude as previously. Oo 


15.1.3 Covering Numbers 


Let us give a simple consequence of Theorem 15.1.5. We recall the covering 
numbers N(T, d, €) of Sect. 1.4. We recall that S(T) = Esup,¢y X;. 


Proposition 15.1.13 There exists a constant L with the following property: 


€> LV A(T, doo) S(T) => €V/log N(T, dz, €) < LS(T) . (15.40) 


A remarkable feature of (15.40) is that, as we shall now prove, the right-hand 
side need not hold if e < /A(T, doo) S(T)/L (see however (15.43)). To see this, 
let us consider the example (15.17). For € = ./n/L, we have €,/log(N(T, do, €)) = 
n3/2/L, while S(T) < Ln, so that the right-hand side of (15.40) does not hold. 
Moreover, since A(T, do) = 2,€ > /A(T, do)S(T)/L. This shows that the 
condition € > L./ACT, dog) S(T) in (15.40) is rather precise. 
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Proof of Proposition 15.1.13 To lighten notation, we set A = A(T, doo). Consider 
€ > Oand a finite set T’ C T such that 


Vs,f€T’, s#t, do(s,t) = lt —sllas >. (15.41) 


Let m = card T’. Thus, N(T’, d, €/2) => m so that by Exercise 2.7.8(b), we have 
y2(T", dx) = €./logm/L. Next, we have y| (T’, doo) < LA logm. This is witnessed 
by an admissible sequence (A,) such that for N, > m, then each set A € A, 
contains exactly one point (see Exercise 2.7.5). Now (15.18) implies 


€ / / 7 
L logm < y2(T', dz) < L(S(T') + VS(T) v1 (T", doo) 


< L(S(T) + /S(T)Alogm) . (15.42) 


Let us denote by L2 the constant in the previous inequality. Now, ife > L3./AS(T) 
where L3 = 2(L2)*, we have VJ S(T)Alogm < €,/logm/L3, so that (15.42) 
implies 

€ 


1 
logm < L2S(T) + —e,/logm 
Lo 2L2 


and therefore €./logm < LS(T). Assuming m = card T’ as large as possible, the 
balls centered at the points of T’ of radius « cover T and N(T, do, €) <m. oO 


The proof of Proposition 15.1.13 does not use the full strength of Theo- 
rem 15.1.8, and we propose the following as a very challenging exercise: 


Exercise 15.1.14 Find a direct proof that under the conditions of Theorem 15.1.8, 
one has 


€ > LVMA(T, d\) = €,/log N(T, d,6) < LM, 


and use this result to find a more direct proof of Proposition 15.1.13. 


For completeness, let us mention the following, which should be compared 
with (15.40): 


Proposition 15.1.15 For each « > 0, we have 
e(log N(T, dz, €))'/4 < LS(T). (15.43) 


In the previous example (15.17), both sides are of order n for ¢ = ./n/L. 
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Research Problem 15.1.16 Is it true that 


€V/log N(T, do, €) < LS(T) ? (15.44) 


For a partial result, and a proof of Proposition 15.1.15, see [109]. 
Exercise 15.1.17 Prove that (15.43) is true if (15.44) always hold. 


15.1.4 Another Way to Bound S(T) 


Next, we describe a way to control S(T) from above, which is really different from 
the method of Theorem 15.1.4. Given a convex balanced subset U of £2 (Le., AU C 
U for |A| < 1, or, equivalently, U = —U), we define 


g(U) =E sup So uigi 
(uj )E€U i>] 


o(U) = sup (Say) : 


(uj )EU i>1 
Given convex balanced subsets U and V of 2, we define 
Tuy = {t= Gis Vadiz1, VO)jz1, 


Say < sup So xin sup Yv2j} . 


(uj)EU i>1 (peV js] 


This is a generalization of the example (15.17) to other norms than the Euclidean 
norm. It follows from (2.118) that, if w > 0, 


2 


WwW 
P( sup 5 Siu = g(U) + wo(U)) < 2exp (- =) P 
(ui)EU js] 


so that (using that for positive numbers, when ab > cd, we have either a > c or 
b>d) 


P( sup ) gin; sup > givj> (@(U) + wo U)V(@(V) + wor V))) 


(uj)EU i>1 (@peV js] 
2 


< 4exp(- =) : 
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Now, 


sup X; < sup So uisi sup >> 58%, 


teTu.v (uj)EU isi (uj)eV jz 


so that, whenever g(U), g(V) < landa(U), a(V) < 2-"/2, we obtain 


2 
P( sup X,> (1 +2-"/2w)?) < 4exp (— —) F 
teTy.y 2 
Changing w into 2”/*w, this yields 
P( sup X;> (1+ w)*) < 4exp(—2”"!w?) . (15.45) 


teTy,y 


Proposition 15.1.18 Consider for n > 0 a family Cy, of pairs (U, V) of convex 
balanced subsets of 0. Assume that cardC, < N, and that 


V(U,V) €Cn, 9), g(V) <1; oY), 0(V) <2". 


Then, the set 


T= conv | U U Tuv| 


n (U,V)EC, 


satisfies S(T) < L. 
Proof It follows from (15.45) that for w > 2, 
P( sup X, > d+ w)?) < > pa P( sup X,> (1+ w)?) 
e n>0(U,V)eC, —'StUV 


< oy Nn exp(—2”—! w*) < L exp(—w*/4) F 


n>0 


and the conclusion by (2.6) as usual. oO 


15.1.5 Yet Another Way to Bound S(T) 


We end this section by a result involving a very special class of chaos, which we 
will bound by a method which is apparently different both from the method of 
Theorem 15.1.4 and from the method used for the set (15.17). To lighten notation, 
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we denote by tg the sequence Os ti, 7 &j)i>1, by (-, -) the dot product in é?, and 


by || - ll2 the corresponding norm. For t = (f,;), let us write 
: = (Yonsei) = = IItgll3 = (tg,t8) => DO tijtiegjge (15.46) 
i=l j=l i=1 j,k>1 
and 


Y= Y¥f-EYS= 0 a jtixgige+ >) >) (sg; -1), (15.47) 


i>1 jk i=1 j=l 
which is a chaos of order 2. 


Theorem 15.1.19 ((44]) For any set T with 0 € T, we have 


Esup|¥;| < Ly2(T, doo)(yo(T. doo) + sup Ils) : (15.48) 
teT teT 


Let us define, with obvious notation, 


> ti, jti,k&j8y = (tg, 08") - 
i,jkE1 


The main step of the proof of Theorem 15.1.19 is as follows. 


Proposition 15.1.20 Let U? := E sup;er ||¢g|l5. Then 


E sup |Z;| < LU y2(T, doo) . (15.49) 
teT 


Proposition 15.1.21 We set V = sup;,er \lt\las. Then 
U<LV + y2(T, doo)) « (15.50) 


Proof We have V? = sup,er IItllens = SuPer pas tP = sup;er EY;*. Fort € 
T, we have ||tg||3 = Y* = ¥; + EY? < Y¥; + V?, and thus, 


U* < V*+EsupY,| . (15.51) 


teT 


Now, combining (15.2) and (15.47), we have 


Esup |¥;| < 2Esup|Z,| . (15.52) 
teT teT 


Combining with (15.51) and (15.49), we obtain Ur = Ves LU y2(T, doo), and this 
proves (15.50). oO 
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Research Problem 15.1.22 The statement (15.50) is basically a bound on 
E supyer,jx<i(¥,¢g), the supremum of a Gaussian process. Provide a direct 
construction of an admissible sequence on the index set which witnesses this 
bound. Recalling that t is viewed as an operator on ¢”, denote by ¢* its adjoint, 
so that(x,tg) = (t*x,g). Prove that yo(H,do) < L(V + 2(T,doo)) where 
A = {t*x;t € T, |x|] < 1}. 


Proof of Theorem 15.1.19 Plug (15.50) in (15.49). oO 
Proof of Proposition 15.1.20 Without loss of generality, we assume that T is finite. 


Consider an admissible sequence (A,,) with 


sup >) 2”? A(An(t)) < 272(T, doo) , 


teT n>0 
where the diameter A is for the distance dy). For A € A,, consider an element 


tan € A, and define as usual a chaining by 7 (t) = t4,(t),n. Since 0 € T, without 
loss of generality, we may assume that g(t) = 0. We observe that 


Zan(t) — Zay(t) = (tn(t) — An-1 (1) 8, An (1)8") 
+ (n—1(t)g, (tn (t) — Tn-10)) 8") « (15.53) 


Recalling that we think of each ¢ as an operator on ¢7, let us denote by f* its adjoint. 
Thus, 


(tn (t) — Mn-1(t)) 8, Mn (18) = (8, Gen (t) — Mn-1(1))* A(t) 8") - (15.54) 
Here, (7n(t) — mn—1(t))*7n(t)g’ is the element of é? obtained by applying the 
operator (,(t) — m,—-1(t))* to the vector z,(t)g’. Let us now consider the r.v.s 


W = supyer |Itg|l2 and W’ = sup,¢r ||tg’ lz. Then 


Il (70 (t) — Tn—1(t))* Tn (t)g" |l2 < [tn (t) — an-1(t))* WI ll tn (t)g’ ll2 
< A(An-1(1))W’. 
It then follows from (15.54) that, conditionally on g’, the quantity ((,(t) — 


n—1(t))g, Hn (t)g’) is a Gaussian r.v. G with (EG7)!/* < A(An_1(t))W’. Thus, 
we obtain that for u > 1 


P(IM(n(t) = n1)g, Ang") = 2"?wA(An-1()W') < exp(—w?2" /2) . 
Proceeding in a similar fashion for the second term in (15.53), we get 


P(\Zs1,(1) — Zan_i(@y| = 2"/7UA(An-1(t))(W + W’)) < 2exp(—u72"/2) . 
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Using that Z,5(;) = 0, and proceeding just as in the proof of the generic chaining 
bound (2.33), we obtain that for u > L, 


P( sup |Z;| > Luyo(T, doo)(W + W')) < Lexp(—w?) . 
teT 


In particular, the function R = sup,<7 |Z;|/(W+W’) satisfies ER? < Ly2(T, Ce 
Since sup,e7 |Z:| = R(W + W’) and EW? = EW? = U?, the Cauchy-Schwarz 
inequality yields (15.49). oO 


Having found three distinct ways, (15.16), Proposition 15.1.18, and (15.48) of 
controlling S(T), one should certainly ask whether there are more. It simply seems 
difficult to even make a sensible conjecture about what might be the “most general 
way to bound a chaos process”. 


15.2 Tails of Multiple-Order Gaussian Chaos 


In this section, we consider a single-order d (decoupled) Gaussian chaos, that is, a 
r.v. X of the type 


a ee es ae (15.55) 


i ,..ld 


where qj,,....;; are numbers and g) are independent standard Gaussian r.v.s. The 
sum is finite; each index ig runs from | to m. Our purpose is to estimate the higher 
moments of the r.v. X as a function of certain characteristics of 


A:= (ai, eee ia dit snag ig<m + (15.56) 


Estimating the higher moments of the r.v. X amounts to estimate its tails, and it 
is self-evident that this is a natural question. This topic runs into genuine notational 
difficulties. One may choose to avoid considering tensors, in which case one faces 
heavy multi-index notation. Or one may entirely avoid multi-index notation using 
tensors, but one gets dizzy from the height of the abstraction. We shall not try for 
elegance in the presentation, but rather to minimize the amount of notation the 
reader has to assimilate. Our approach will use a dash of tensor vocabulary, but 
does not require any knowledge of what these are. In any case for the really difficult 
arguments, we shall focus on the case d = 3. 

Let us start with the case d = 2 that we considered at length in the previous 
section. In that case, one may think of A as a linear functional on R” by the formula 


A(x) = yaa : (15.57) 


ij 
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: : Bas 2 ; 
where x = (Xj, ;)i,;<m is the generic element of R” . It is understood that in (15.57), 
A : eg ; . 
the sum runs over i, 7 < m. When we provide R” with the canonical Euclidean 
Dede ce 
structure, the norm of A viewed as a linear functional on R” is simply 


All .2) = (a) (15.58) 


ij 


This quantity was denoted ||A||775 in the previous section, but here we need new 
notation. We may also think of A as a bilinear functional on R” x R” by the formula 


AG Wie y  asnyy; (15.59) 
ij 


where x = (xi)i<m and y = (y;)j<m. In that case, if we provide both copies of R” 
with the canonical Euclidean structure, the corresponding norm of A is 


oy eu yy Sih (15.60) 


Alla) = sup {| ai jxiy; 
i,j 


which is also the operator norm when one sees A as a matrix, i.e., an operator from 
IR” to R”. One observes the inequality || A[|(1}(2} < ||All(,2}- 
Let us now turn to the case d = 3. One may think of A as a linear functional on 


a 3 os 
R”’ , obtaining the norm 


1/2 
|All (1.2.3) = (Soa?ix) ‘ (15.61) 


i,j,k 
or think of A as a trilinear functional on (R’”)+, obtaining the norm 


(ae Losi basi. 


(15.62) 


I| A ll ay42}¢3) = sup | > Gi, j,kXiY jZk 
i, jk 
One may also view A as a bilinear function on R”™ xR” by the formula 


A(X, Y) = Do ai, jai, fYE » (15.63) 
i,j,k 


for x = (xi, j)i,j € R” and (yx) € R”. One then obtains the norm 


sxe ovesif. as.) 


|All a2343) = sup {| > Gi, j,kXi,j Yk 
i,j,k 
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We observe the inequality 


IA I ¢13(23133 < WAlla.2} < WAll.2,3) - (15.65) 
More generally, given a partition P = {f,..., J} of {1,...,d}, we may define the 
norm 
Allo = Ally... (15.66) 
by viewing A as a k-linear form C on Fy x --- x Fe where Fy = pre ond 
defining 
WAllay. te = WE Mya} » (15.67) 


where the right-hand side is defined as in (15.62). When the partition P’ is finer than 
the partition P, then 


Alle < IlAllp . (15.68) 


The moments of the r.v. X of (15.55) are then evaluated by the following formula: 
Theorem 15.2.1 (R. Latala [48]) For p > 1, we have 


1 


KD >> petPPIAllp <||XIlp < Kd) > pe4P/21 Allp (15.69) 
P 


P 


where P runs over all partitions of {1,..., d}. 


A multidimensional array as in (15.56) will be called a tensor of order d (the 
value of m may depend on the context). 


Exercise 15.2.2 Generalize Theorem 15.1.4 to a set T of tensors of order d 
using the upper bound of (15.69). Hint: This assumes that you know how to 
transform (15.69) into a tail estimate. 


The proof of the lower bound in (15.69) is significantly easier than the proof of 
the upper bound, and we start with it. 


Proof of the Lower Bound in Theorem 15.2.1 We shall prove this lower bound 
only for p > 2. First, we observe that for d = 1, this simply reflects the fact that 
for a standard Gaussian r.v. g, one has (E|g|?)!/? > J/pP/L. (No, this has not been 
proved anywhere in this book, but see Exercise 2.3.9.) Next, we prove by induction 
on d that for each d, one has 


Dp 7) 1/2 
(E|X|?)'/? > Puls a d= vP( Bata) (15.70) 


15.2 Tails of Multiple-Order Gaussian Chaos 477 


For this, we consider the random tensor B of order d — 1 given by 


_ d 
Diy,..ia1 = y Giy,...,id-1,18) - 


i<m 


Applying the induction hypothesis to B given the r.v.s ce and denoting by E’ 
expectation given these variables, we obtain 


1 


(E’|X|?)!/P > vP( 2 Cn a 7 


I] ,...5ld—1 


We compute the norm in L? of both sides, using that for p > 2 one has 
(E|¥|?)!/? > (EY2)!/ to obtain 


1 VP 2 ue 
(E| X|”) p> Ve » a a) , 
Eltseess td=1 
which yields (15.70) for d. (It is only at this place that a tiny extra effort is required 
if p < 2.) 
Let us now prove by induction over k that 


pki? 
(ELX|?)/? > SIA te (15.71) 
The case k = | is (15.70). For the induction from k — | to k, let us assume without 
loss of generality that , = {r + 1,...,d}, and let us define an order d — r random 
tensor C by 


_ & dag r 
Chisnid = y Giy,....ig8i, °** Si, > 
i 


grees I; 


so that 


_ . . oftl d 
X= > Cirttysid 8i,41 °° * Sig + 


ippiyesdd 


Denoting now by E~ expectation only in the r.v.s gi, forr+1< ¢ < d,we 
use (15.70) to obtain 


tpg sel, 


(E~|x|?)!/? = wP( S- o J . 


iptiseesld 
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Consequently, if x;,,),...,ig are numbers with ae _ es < 1, one gets 
~ py1 JP 
(E |X|?) /P = ~—| , Ch igeay idXipgy yes ‘| 
KK. 
Tr+lyees Id 


yea, ews (15.72) 


pene ir 


=e) 
K . 


where 


We now compute the LZ? norm of both sides of (15.72), using the induction 
hypothesis to obtain 


k/2 
P 
(E|X|?)!/? > Wilt» 


where D is the tensor (dj,,...;,). The supremum of the norms on the right-hand 


side over the choices Of (%i,44,...,ig) with De tae igh sanaste < Lis ||Ally,,...,.4,- (A 
formal definition of these norms by induction over k would be based exactly on this 
property.) Oo 


Let us denote by £1,..., Eg copies of R”. The idea is that Ex is the copy that 
corresponds to the k-th index of A. Given a vector x € Eg, we may then define the 


contraction (A, x) as the tensor (b;,,.i,_,) of order d — | given by 
Die cid Y Centar : 
i<m 


The summation here is on the d-th index, consistent with the fact that x € Eg. 

If G is a standard Gaussian random vector valued in Eq, 1.e., G = (g;)i<m, where 
gj are independent standard r.v.s, then (A, G) is arandom tensor of order d — 1. We 
shall deduce Theorem 15.2.1 from the following fact, of independent interest: 


Theorem 15.2.3 Consider d > 2. Then for all t > 1, we have 


E\(A, G)llayta—1y SK Doc P 44 Alp , (15.73) 
P 


where P runs over all the partitions of {1, ..., d}. 
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Here, as well as in the rest of this section, K denotes a number that depends only 
on the order of the tensor considered and certainly not on T. 

The bound (15.73) has the mind-boggling feature that the powers of t in the 
right-hand side may have different signs. This feature will actually appear very 
naturally in the course of the proof. It will be used only through Corollary 15.2.4. 

If we think of A as ad-linear form on FE; x --- x Eg, then 


¥ := (A, Gylqy..fa-15 = sup A(x! ..., 27-1, G), (15.74) 


where the supremum is over all choices of x with ||x‘|| < 1. Therefore, the issue to 
prove (15.73) is to bound the supremum of a certain complicated Gaussian process. 


Corollary 15.2.4 For all p > 1, one has 


1 4 = 
(EMA, Gey aay)? SK pth HA Alp. (15.75) 
P 


Proof As witnessed by (15.74), the rv. Y = ||(A, G)|l11}...4g—1) is the supremum of 
Gaussian r.v.s of the type Z = A(x, ee xd- ,G), where in this formula we view 
A as ad-linear map on FE x --- x Eg and where x° is a vector of norm < 1. Now, 
the Gaussian r.v. Z is of the type Z = )°; a;g;, and the formula 


(<( Xan)’ = (Laan Laws Do sy 
implies 


(EZ?)1/2 = sup |ACl,...,x7-1,x)| <0 = Ally-a_nq - 


Ix|<1 
It then follows from (2.118) that for u > 0, the rv. Y satisfies 


2 
P(jy —EY|> wu) < 2exp(- <5) 
Oo 


Then (2.24) implies 
(EP SEY ys iype: 


and since (E|Y|?)!/? < E|¥| + (E|Y — EY|?)!/? < E|Y¥| + L\/po, using (15.73) 
for t = p'/? to bound EY], the result follows. oO 


Proof of the Upper Bound in Theorem 15.2.1 We proceed by induction over d, 
using also (15.75). For d = 1, (15.69) reflects the growth of the moments of a 
single Gaussian r.v. as captured by (2.24). Assuming that the result has been proved 
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for d — 1, we prove it for d. We consider the Gaussian random vector G = (94) and 
the order d — 1 random tensor 


B=(A,G) = (i,,...ia-1) 5 


where 


= ee Mee 
Di,..iat = y Gi,...,id-1,18] + 


i<m 


Thus, 


= 1 d _ sy, ah d-1 
i= > Giy,...,ia Bi °° Big = > Dist 48 Sig * 
iy 


soevg ig Tisieid=] 


Let us denote by E’ expectation given G. Then the induction hypothesis applied to 
B implies 


(EXP)? < KY p*42 1 Bllg, (15.76) 
Q 
where the sum runs over all partitions Q of {1,...,d— 1}. We now compute the L? 


norm of both sides, using the triangle inequality in L? to obtain 


(E)X|?)'/? < KY) pm 2PE BIg)? . (15.77) 
Q 


Our next goal is to prove that 


1 = 
(EIB 3) /P < Kp card 2/2 poardP/211 Alin , (15.78) 
P 


providing the same bound K }°p padP/ ?\|Allp for each term in the summation 
of (15.77) and finishing the proof of (15.69). 


To prove (15.78), we denote by Jj, ..., J; the elements of Q, so that 
(Ei Bi)? = (EMA, GIR)? (15.79) 
(a) i Tl ; . 
card Ip 


For £ < k, we define Fy = R” , and we define Fy4; = R”. Let us view A as a 
(k + 1)-linear form C on the space F; x --- x Fx41. Thus, from (15.67), we have 
ICA, G) Ilr oe = WMC, G) II qay,....@- Applying (15.75) to C (with d = k + 1), we 
get 


1 = 
(EMCG)F, iy)? Ske ye? les 
R 
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where 7 ranges over all partitions of {1,...,k + 1}. Combining with (15.79), 
we have obtained a stronger form of (15.78), where the summation in the right- 
hand side is restricted to the partitions ? which are coarser than the partition 
T,..., Ik, {d} of {1,..., d}. Oo 


The rest of this section in devoted to the proof of Theorem 15.2.3. The proof is 
by induction over d. The case d = 2 is very simple. We simply write 


MA, G)lny = sup AGG) =(2( Dares) ) 


es ; 


and use of the Cauchy-Schwarz inequality proves that 


EIA, G)ilty =(Daki) = |All) (15.80) 


while for card P = 1, one has card P — d+ 1=0Oand peard P—d+1 — 1, 

In order to help the reader penetrate the very deep ideas involved in the proof 
of Theorem 15.2.3, we will now assume that d = 3. The proof of Theorem 15.2.3 
for the general value of d does not require any essentially new idea, but it is more 
complicated to write. We refer to Latata’s paper for this. We start with some tools 
(of fundamental importance). 


Lemma 15.2.5 Denoting by w the canonical Gaussian measure on IR", then for 
each closed symmetric set V of R, one has 


xi? 
w(V +x) = (Vy exp (— —). (15.81) 


Proof Let us denote 4 Lebesgue’s measure on R”. Then, using symmetry in the 
third line, the parallelogram identity and convexity of the exponential in the fourth 
line, and setting c = (Q0)7™/2, 


wM(x+V)= cf exp (— Ir =) aay ) 
x+V 


= ef exp(— LP 
=¢ [3(00(- 2) +0(- BF) ano 


2 2 
ef emp(- HELPP ancy 
V 2 
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Lemma 15.2.6 Consider a standard Gaussian vector G valued in R™ and x in R". 
Consider a semi-norm a on R"™. Then 


1 lll? 
P(a(G—x)< 4Ea(G)) > 5 exp ( - —) : (15.82) 
Proof We consider the set V = {y € R™ ; a(y) < 4Ea(G)}, so that by Markov’s 
inequality, w(V°) < 1/4 and consequently 4(V) > 3/4. Then (15.81) implies 


3 lll? 
P(a(G — x) < 4Ea(G)) = w(V +x) > Fexp ( -=). 
where we have used in the equality that jz is the law of G and (15.81) in the last 
inequality. Oo 


We recall the entropy numbers e,(T,d) of (2.36). The next consequence of 
Lemma 15.2.6 is called “the dual Sudakov inequality” and is extremely useful to 
estimate these entropy numbers. 


Lemma 15.2.7 Consider a semi-norm a on R” and a standard Gaussian rv. G 
valued in R™. Then if dy is the (quasi)-distance associated with a, the Euclidean 
unit ball B of R"” satisfies 


en(B, dy) < L27"/7Ea(G) . (15.83) 


Proof From (15.82), we get 


2 
Pla(G — x) < 4€a(G)) > 5exp(- “EY, 


and, by homogeneity, for x € B and t > 0, 
1 1 
P(a(tG — x) < 4tEa(G)) > - exp ( = =) . (15.84) 
2 22 


The proof is really similar to the argument of Exercise 2.5.9. We repeat this 
argument for the convenience of the reader. Consider ¢ = 4tEa(G) and a subset U 
of B such that any two points of U are at mutual distances > 3. Then the closed! 
balls for dy of radius € centered at the points of U are disjoint. Now, (15.84) asserts 
that the probability that G belongs to any such ball is > exp(—1/(2t~)/2, so that 
cardU < 2exp(1/2t7). Taking U as large as possible, the balls centered at U of 
radius 3 = 12tEa(G) cover B. Taking t such that 2exp(1/2t*) = 22" (so that 
t < L2~"/?), we have covered B by at most 2?" ball of radius L2-"/*Ea(G). Oo 


! We take the centers of the balls at mutual distance > 3e to ensure that the closed balls centered 
at these points are disjoint. 
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In the next page or so, we make a detour from our main story to provide a proof 
of the Sudakov minoration. 

To understand the difference between the Sudakov minoration and the dual 
Sudakov minoration, consider the set T which is the polar set of the unit ball for 
a. That is, denoting by (x, y) the duality of R” with itself, 


T= {x eR"; Vy eR” aly) <1 > (x,y) <Y. 
Then a(G) = sup,e7(x, G). According to (2.117), the Sudakov minoration states 


that e,(T, d) < L2~"/*Ea(G), where d is the Euclidean distance. 


Exercise 15.2.8 Consider a symmetric subset A of R” (i.e.,x € A => —x € A) 
and the semi-norm a on R” given by a(y) = sup{|(x, y)|;x © A}. Prove that 
Ea(G) = g(A). 


The following is a simple yet fundamental fact about entropy numbers: 


Lemma 15.2.9 Consider two distances d, and dz on RR” that arise from semi- 
norms, of unit balls Uy and U2, respectively. Then for any set T C R”, one has 


enti (T, dz) < 2en(T, d)en(U1, d2) . (15.85) 
Proof Consider a > é,(T, d)) so that we can find points (tg)g<y, of T such T C 


Ue<n, (te + aU). Consider b > e, (U1, dz), so that we can find points (u¢)e<y,, for 
which U; C Ugen,, (ue + DU2). Then 


TC U (te + aug + abU2) . 
£,L'<Nn 


Let 


~ 
ll 


{(2,0); €,0°< Nn, (te t+auy +abU2) NT #9}, 


so that card] < N2 = Nn+1. For (€, €') € I, let vee € (te + aug + abU2) NT. 
Then 


Tc U (ve ¢ + 2abU2) , 
(Cel 


so that én41(T, d2) < 2ab. oO 
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The following very nice result is due to N. Tomczak-Jaegermann: 


Lemma 15.2.10 Consider on R” a distance dy induced by a norm of unit ball V, 
and let V° be the polar set of V as in (8.55). Denote by B the Euclidean ball of R” 
and by dz the Euclidean distance. Assume that for some numbers a > 1, A and n*, 
we have 


0 <n <n* >e,(B,dy) <2-"/A. (15.86) 
Then 
O<n<n* >e,(V°,d2) <16-2-°7%A. (15.87) 


Proof Consider n < n*. Using (15.85) in the first inequality and (15.86) in the 


second one, we obtain 
enti (V°, dy) < 2en(V°, dr)en(B, dy) < 2-7/**! Aen(V°, do) . (15.88) 


Let us now denote by (-, -) the canonical duality of R” with itself, so that if y e V 
and z € V°, we have (y,z) < 1. Consider x,t € V°, anda = dy(x,t). Then 
x—te2V° andx —t € aV, so that 


Ix —t13 = @ —t,x—-1) <2, 


and thus d2(x,t)? < 2dy(x, t). Consequently, én41;(V°, d2)* < 2en41(V°, dy). 
Combining with (15.88), 


ens (V", doy? <2? Aen (V", db) 


from which (15.87) follows by induction over n. oO 


Proof of Sudakov Minoration (Lemma 2.10.2) We keep the previous notations. 
Consider a finite subset T of R”, and define the semi-norm a on R” by a(x) = 
sup, er |(x, t)|, so that Ea(G) = Esup,<7 |X;+| where X; = (G, t), and by (15.83), 
we have e,(B,dy) < L2-"/2E|a(G)|, where dy is the distance associated with 
the semi-norm a. Denoting by V the unit ball of the semi-norm dy, the set T 
is a subset of V°, and Lemma 15.2.10 implies e,(T,d2) < L2~"/*Ela(G)| = 
L2-"/2E sup,;er |Xr|. To evaluate e,(T, dz), we may by translation assume that 
0 € T, and by (2.3), we obtain e,(T, d2) < L2~"/*E sup, X1, which is the desired 
result by (2.117). oO 


We go back to our main story. Our next goal is to prove special extensions of 
Lemmas 15.2.6 and 15.2.7 “to the two-dimensional case”. We consider two copies 
E, and E> of R”, and for vectors i € Ep, y® — (yDiem, we define their tensor 


: 2, : 
product y! @ y? as the vector (Zi;,i.) in R” given by zi,,i. = Yi, yp: Let us consider 
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for € < 2 independent standard Gaussian vectors G® valued in Ey, and let us fix 
vectors x € Ey. We define 


Ug =x! @x?: Uny =G! @x’; Up, =x' @G’; Un.2;)=G'@G?’. 


We denote by ||x|| the Euclidean norm of a vector x of E¢. 


Lemma 15.2.11 Set Z = {{1}, {2}, {1, 2}}, and consider a semi-norm a on R™, 
Then 


1 1 

P( atin) — Un) < o4/Eau,)) > pexp( — 50a? +P) 
TeT 

(15.89) 


Proof We will deduce the result from (15.82). We consider the quantities 
S = 4a(G! ® G?) = 4a(U1,.2)) ; T = 4a(G! @ x”) = 4a(U yy . 
We denote by E” conditional expectation given G*, and we consider the events 
2) = {a(U 2} — Ug) < 4Ea(U})} , 
22 = {a(Up1,2} — Up) < E’S}, 
and 
23 = {E°S < 4ES+ET}. 
When these three events occur simultaneously, we have 
a(U11,23 — Ug) < a(Uy1,2} — Ujp2}) + a(Up2} — Ug) 
< E’S + 4Ea(Up2;) 
< 4ES + ET + 4Ea(U,2}) 


2 aU Eats), (15.90) 
IeL 


where we have used that ET = 4Ea(U;1;) and ES = 4Ea(U1,,2;). Next, we prove 
that 


212 
wry (15.91) 


1 
P(21.N 23) > = = 
(219.23) > Sexp(- 


486 15 Gaussian Chaos 


For this, we consider on E2 the semi-norms 
ai(y) = a(x! @ y); an(y) = 4Ea(G' @ y). 


Thus, E*S = a2(G*) and ET = a(x’). Since Up, — Ug = x! @ (G* — x”), we 
have 


Q, = {or (G* — x*) < 4Eai(G*)} , 
23 = {ax(G*) < 4Ea2(G*) + a9 (x*)} . 
Consider the convex symmetric set 
V={ye€ Ep; ay(y) < 4Ea\(G’), a2(y) < 4Ea2(G*)} . 
Then Markov’s inequality implies that P(G* € V) > 1/2, so that (15.81) yields 


x71? 


1 
pa 2 a MN 
P(G eV +x?) > = exp( 5 )s (15.92) 


The definition of V and the inequality a2(G*) < a2(G* — x*) + a(x”) imply that 
(6? eV exec 2102s, 


so (15.92) implies (15.91). 
Finally, we prove that if P? denotes probability given G*, then 


1 112 
P?2(Q7) > = exp (— me ) (15.93) 


For this, we may think of G? as a given deterministic vector of E>. We then consider 


on R” the semi-norm a’ given by a’(y) = a(y ® G’). Since Upy,2) — Up) = 
(G! — x!) @ G’, we have 


22 = {a’(G' — x!) < B’S = 4E*a'(G))} , 


so that (15.93) follows from (15.82). 

Now the events 2; and 23 depend on G? so that (15.93) implies that P(Q, 9 
23 1 22) > exp(—||x! \|7/2)P(21 MN §23)/2 and (15.91) proves that the probability 
that 2,, 22 and Q3 occur simultaneously is at least 2~* exp(—(||x! |? + |]x71|7) /2). 
Combining with (15.90) completes the proof. oO 
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Through the remainder of the section, we write 
B= {x= (x!,x”) € E, x Eo; Ix <1, le? <1}, (15.94) 


and we first draw some consequences of Lemma 15.2.11. 


Lemma 15.2.12 Consider a subset T of 2B and a semi-norm a on R””. Consider 
the distance dy on T defined for x = (x!, x?) and y= (y!, y?) by 


da(x,y) =a(x' @x?- yl @y’). (15.95) 
Let us define 
a&(T) = sup (Ea(x' @ G*) + Ea(G' @ x*)). (15.96) 
xeT 
Then 
etfidsy= LO Ped) 4+2 "Exo eG’). (15.97) 


Proof For any t > 0, using (15.89) for x© := x°/t rather than x°, we obtain 


where 
W = 4r(Ea(x! @ G*) + Ea(G! @ x*)) + 1617Ea(G! @ G’). 


When x = (x!,x7) € T C 2B, one has ||x!||? + ||x?||? < 8 and (recalling (15.96)) 
W <4ra(T) + 16t7Ea(G! @ G’), so that 


1 4 
P(a(r?G' @ G2 — x! @ x”) < 4ra(T) + 16772Ea(G! @ G*)) > exp ( = | 
4 ie” 
(15.98) 


Let 

€ = 4ra(T) + 16t*Ea(G! @ G*), 
and consider a subset U of T such that any two points of U are at mutual distances 
> 3¢ for dy. Then the sets {z € R”™: a(z — x! @ x”) < €} forx € U are disjoint, 


so that (15.98) implies 


cardU < 4 exp(4r~7) : (15.99) 
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Taking U maximal for the inclusion proves that N(T,dy,3€) < 4exp(4t~). 
Choosing f so that this quantity is 2" finishes the proof. oO 


We are now ready to start the proof of Theorem 15.2.3 for d = 3. For a subset T 
of E, x Eo, we define 


F(T) :=Esup A(x!, x”, G) . (15.100) 


xeT 


Since all our spaces are finite dimensional, this quantity is finite whenever T is 
bounded. The goal is to bound 


F(B) = Ell(A, G)|lq1y42y - (15.101) 


We consider the semi-norm a on R”” given for z = (zi, ;)i,j<m by 
2y 1/2 
a(z) = O03 ) ) (15.102) 
ko ty 


Then the corresponding distance dy on FE; x Ez given by (15.95) is the canonical 
distance associated with the Gaussian process X, = A(x}, x2, G)2 In particular, 
we have 


EA(x1, x2, G)? = dy(0, x)’ . (15.103) 
Lemma 15.2.13 We have 

Eo(G! @ x”) < IIA, x7)Ilq,3) (15.104) 

Eo(x! ® G?) < |(A, x')Il12,3) (15.105) 

Eo(G! ® G*) < ||Allu.2,3) - (15.106) 


Proof Here, if A = (aj, ;,~) and x2 = (x7), then (A, x2) is the matrix (b;,,) where 
bik = Di; Gi, j,kX4, and ||(A,x*)ll1,3} = (0; 7,)!/7. To prove (15.104), we 
simply observe that a(G! @ x”) = OO bixg})*)!?, so that Ea(G! @ x”) < 
(0; 7)! = IMA, x) Ilqa,3}- The rest is similar. o 


Lemma 15.2.14 For u = (u!,u2) € Ey x Ex and T C 2B, one has 


F(u+T) < F(T) +2\\(A, w')llyo,3} + 2I(A, v7) llqa3 - (15.107) 


? This semi-norm will be used until the end of the proof. 
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Proof The proof starts with the identity 


A(x! +u!, x? +u?, G) = A(z!, x7, G) + Atu!, x”, G) 
+A. G)+ Atul u?,Gy. 15,108) 


We take the supremum over x € T and then expectation to obtain (using that 
EA(u!,u*,G) = 0) 


F(T +u)< F(T)+CO,+C, 
where 


C;=E sup A(u!,x*,G); C2=E sup A(t!,u’,G). 


I|x?||<2 IIx! <2 


We then apply (15.80) to the tensor (A, u!) to obtain Cy < 2\|(A,u!)||2,3) and 
similarly for C2. Oo 


This result motivates the introduction on E, x E2 of the semi-norm 
or* (x) = A, x") Ihe2,3) + MAS 2) Ml3) - (15.109) 
We may then rewrite (15.107) as 
Fu+T) < F(T) +2a*(u). (15.110) 


We denote by dy* the distance on FE, x E2 associated with the semi-norm a”. 
The semi-norm a@* has another use: the quantity a@(T) defined in (15.96) satisfies 


a(T) < sup{a*(x); x ET}, (15.111) 


as follows from (15.104) and (15.105). 
Lemma 15.2.15 We have 

én(2B, dar) < L2-"/*||Alla,2,3) - (15.112) 
Proof A standard Gaussian random vector valued in the space E; x Ep? is of the 
type (G!, G*) where G! and G? are independent standard Gaussian random vectors. 


Proceeding as in (15.80), we get 


E\(A, G')|l2,3) < lAlla.2.3) » (15.113) 
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and similarly E||(A, G’) la,3} < A lle1,2,3}5 so that 
Ea*(G', G*) < 2I|Allu,2,3} - 


Lemma 15.2.7 then implies the result. oO 
Exercise 15.2.16 Write the proof of (15.113) in detail. 


Given a point y € B anda, b > 0, we define 


C(y,a,b) ={x € B—y; dy(0,x) <a, dgx(0,x) <b}. (15.114) 
We further define 
W(a, b) = sup{F(C(y,a,b)); y € B}. (15.115) 


Since C(y,a,b) C C(y,a’,b’) fora < a’,b < D’, it follows that W(a, b) is 
monotone increasing in both a and b. This will be used many times without further 
mention. The center of the argument is as follows, where we lighten notation by 
setting 


Si = [|All q1,2,3} - (15.116) 

Lemma 15.2.17 For all values of a,b > 0 andn > 0, we have 
Wa, b) < L2"/*a + Lb + W(L27/2b + L2-"Sy, L277 81) . (5.117) 
Proof Consider y € B so that B— y C 2B and T := C(y,a,b) C 2B. 
Since dy*(0,x) = a*(x), it follows from (15.111) and (15.114) that @(T) < b. 
Combining (15.97) and (15.106), we obtain that e,(T,dy) < 6 := L(2-"/2b + 
2~"S1). Using also (15.112), we find a partition of T = C(y,a, b) into Nyy) = 


Qe" sets which are of diameter < 6 for dy and of diameter < 5* := £248 1 for 
dy«. Thus, we can find points y; € C(y, a, b) fori < Ny+1 such that 


Cy,a,b)= (J Oh, (15.118) 


i<Nn+1 


where 
T; = {x € Ey x Bx; x €C(y,a,b), da(yinx) <8, dar(yi,x) <8}. 
For x € B — y, we have x — y; € B — (y + yj), so that 


T;— yi CCQ + yi, 4, 8*) (15.119) 
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and thus 
T, Cy +Cly + yi, 6,8"). (15.120) 
Also, since y; € B — y, we have y + y; € B, so that 
F(C(y + yi, 6, 8*)) < W6, 5") , 


and combining with (15.120) and (15.110), and since a*(y;) = dy*(y;,0) < b 
because y; € C(y, a, b), we obtain 


F(T) < W(8, 6*) + 2b. (15.121) 


Furthermore, since 7; C C(y,a,b), for x € Tj, we have dy(0,x) < a. 
Recalling (15.100) and (15.103), it follows from (2.124) that 


F( U 71) < Lay/log Nig + max F(T). 
1<Nn4+1 


i<Nn4+1 
Combining with (15.121) and (15.118) implies 
F(C(y, a, b)) < La2"* + 2b + W(5, 6") , 


which is the desired conclusion. oO 


Proposition 15.2.18 Forn > 0, we have 
Wa, b) < LQ"?a+b4+2-7758)). (15.122) 
Proof of Theorem 15.2.3 for d = 3 We set 


S3 = ||A ll 1342343) 
So = |JAllqay42,3} + IA llaya.3y + WA ay1.2 - 
Since a(x! @ x2) = sup{A(x!, x2, x3); |lx3|| < 1}, we have dg(x,0) < $3 for 
x € B. Also by (15.109), we have dy+(0,x) = a*(x) < S2 for x € B. Therefore, 
BCcC(0, $3, $2) so that using (15.122), 
F(B) < W(S3, Sx) < L2"/?S3 + Sy +27-"/?5}) . 


Recalling (15.101) and choosing n so that 2”/2 is about t proves (15.73). oO 
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Proof of Proposition 15.2.18 Using the monotonicity of W in a and b, one may 
assume that all the constants in (15.117) are the same. Denoting by Lo this 
constant, (15.117) implies in particular that for n, 29 > 0, one has 


Wa, b) 


< Lo2@tno/2q 4: Enh + W(Lo27 "t")/25 gs. Lo27@+10)/2.5,) : 


Fixing no a universal constant such that Lo270/2 < 27-2 implies that for n > 0, one 
has 


Wa, b) < L2"?a + Lb + WO? 7p 42 78),2 O51), (15.123) 
Using this fora = 27” S; and b = 2-"/2 5, we obtain 
Wi *S1,2 77S) 5 L273 + We 2 
Given r > 0, summation of these relations for n > r implies 
WOT S1,2°7/7 5) 3227S; (15.124) 
Using this relation, we then deduce from (15.123) that 
WG? SS 12e412-7s, , 


and bounding the last term of (15.123) using this inequality yields (15.122). oO 


We strongly encourage the reader to carry out the proof in the case d = 4, 
using (15.122) and the induction hypothesis. 


15.3. Notes and Comments 


Our exposition of Latata’s result in Sect. 15.2 brings no new idea whatsoever 
compared to his original paper [48]. (Improving the mathematics of Rafat Latata 
seems extremely challenging.) Whatever part of the exposition might be better than 
in the original paper draws heavily on J. Lehec’s paper [54]. This author found [48] 
very difficult to read and included Sect. 15.2 in an effort to make these beautiful 
ideas more accessible. It seems most probable that Latata started his work with the 
case d = 3, but one has to do significant reverse engineering to get this less technical 
case out of his paper. 


Chapter 16 ®) 
Convergence of Orthogonal Series: od 
Majorizing Measures 


16.1 A Kind of Prologue: Chaining in a Metric Space 
and Pisier’s Bound 


As will become apparent soon, the questions considered in this chapter have a lot to 
do with the control of stochastic processes which satisfy the condition (1.17) for the 
function g(x) = x?. Before we get into the sophisticated considerations of the next 
few sections, it could be helpful to learn a simpler and rather robust method (even 
though it is not directly connected to most of the subsequent material). The present 
section is a continuation of Sect. 1.4 which the reader should review now. 


Definition 16.1.1 We say that a function g : R > Ris a Young function if g(0) = 
0, gp(—x) = g(x), g is convex, and g ¥ 0. 


On a metric space (T,d), we consider processes (X;);er that satisfy the 
condition 


a — Sr) - 


Vs,t eT ; F(a 


(16.1) 


Condition (16.1) is quite natural, because given the process (X;);e7, it is simple to 
show that the quantity 


d(s,t) = inf | > 0; Ep(——) < i} (16.2) 


is a quasi-distance on 7, for which (16.1) is satisfied. ! 


'Tt would also be natural to consider processes where the size of the “increments” X,; — X; is 
controlled by a distance d in a different manner, e.g., for all u > 0, P(|Xs — X;| = ud(s,t)) < 
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We are interested in bounding processes which satisfy the condition (16.1). We 
have already established the bound (1.18). In this bound occurs the term N(T, d, €)? 
rather than N(7,d,¢€). This does not matter if p(x) = exp(x7/4) — 1 (so that 
gp '(N*) = 2,/log(N2 + 1) < 2y7!(N)), but it does matter if, say, p(x) = |x|?. 
We really do not have the right integral on the right-hand side. In this section, we 
show how to correct this, illustrating again that even in a structure as general as a 
metric space, not all arguments are trivial. The same topic will be addressed again 
in Sect. 16.8 at a higher level of sophistication. 

To improve the brutal chaining argument leading to (1.18), without loss of 
generality, we assume that 7 is finite. Form > no, we consider a map 0, : Th41 > 
T, such that d(6,(t),t) < 2~" for each tf € T,41. Since we assume that T is 
finite, we have T = Tj, when m is large enough. We fix such an m, and we 
define z,(t) = t foreach t € T and eachn > m. Starting with n = m, we then 
define recursively 7,(t) = 0n(7n+1(t)) for n > no. The point of this construction 
is that 7,+41(t) determines z(t) so that there are at most N(T,d, ot) pairs 
(7n+1(t), On (n41(t))) = (tn41(t), Wn (t)), and the bound (1.13) implies 


Esup |Xx,,,(1) — Xx,9| <2-"@ (N(T, d,2-"')) . (16.3) 
teT 
Using the chaining identity 


Xt — Xm) = Xm — Xm » 


k>n 
we have proved the following: 
Lemma 16.1.2 We have 
E sup |X; — Xx,iyl < ))2-*o (N(T,d,2-*})). (16.4) 
teT 


k>n 


Taking n = no, this yields the following clean result (due to G. Pisier): 
Theorem 16.1.3 (G. Pisier) We have 


A(T) 
E sup |X, — X;| < Lf yo '(N(T, d, €))de . (16.5) 
0 


5,teT 


In this chapter, we will learn how to go beyond the bound (16.5) when the 
function g(x) has a much weaker rate of growth than exp x? — 1, and first of all, 
g(x) = x2, 


w(u), for a given function y, see [143]. This question has received considerably less attention than 
the condition (16.1). 
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In contrast with (1.19), Theorem 16.1.3 does not provide a uniform modulus of 
continuity. We investigate this side story in the rest of this section. A clever twist is 
required in the argument. 


Theorem 16.1.4 For any 6 > 0,n > no, we have 


E sup [Xs— X;| <dg"'(N(T,d,2"")) +4) (2p WN (Td, 21). 


d(s,t)<6 tn 


(16.6) 


To ensure that the right-hand side is small, we fix n large enough so that the sum 
is small, and then we take 6 small enough that the first term of the right-hand side is 
small. 


Proof We fix n and we set Z = sup;e7 |X — Xz,,(t)|. We define 
V = {(tn(S), M(t); d(s,t) < 6} C Tn x Th. 


Given s,t € T with d(s,t) < 6, we have |X; — Xj, (s)| < Z and |X; — Xz,()| < Z 
and so that |X; — X;| < |Xx,(s) — Xz, (t)| + 2Z and thus 


sup |X; — X;|< sup |Xqg— Xp|+2Z. (16.7) 
d(s,t)<6d (a,b)eV 


For (a, b) € V, we choose (s(a, b), t(a, b)) € T x T such that d(s(a, b), t(a, b)) < 
6 anda = m,(s(a,b)), b = mn(t(a, b)). Thus, |Xq — Xsca.p)| < Z and |Xp — 
X1(a,b)| < Z, so that |Xqg — Xp| < |Xs(a,p) — Xt(a,p)| + 2Z. Combining with (16.7), 


sup |X;—X;|< sup |Xsca,p) — X1a,p)| + 4Z - (16.8) 
d(s,t)<é (a,b)eV 


Using (1.13), we have 


E sup |X(a,b) — Xt(a)| < 6 1(N(T, d,2-")*) , 
(a,b)EV 


so that taking expectation in (16.8) and using (16.4) completes the proof. Oo 


16.2 Introduction to Orthogonal Series: Paszkiewicz’s 
Theorem 


An orthonormal sequence (@,,)>1 on a probability space (2, P) is a sequence such 
that Ey? = | foreachn and Eg,,g, = 0 form # n. A classical question asks which 
are the sequences (a,,) for which the series 


> AmPm (16.9) 


m 


496 16 Convergence of Orthogonal Series: Majorizing Measures 


converges a.s. whatever the choice of the orthonormal sequence (@,,) and of the 
probability space. (See Sect. 16.10 for comments on this question.) Since the series 
ey Am€m Must converge a.s., where €,, are independent Bernoulli r.v.s, we have 
2 ne a>, < © (see Exercise 6.3.4). As we shall see, the condition os a>, < 00 
is however far from sufficient: there exist an orthonormal sequence (g,,) and 
coefficients a,, such that Dae ae < oo and the series ys AnYm diverges 
everywhere. 7 7 
Let us consider the set 


T=) ya, eel]: (16.10) 


m<n 


Since 7.51 ap, < 0O, we may assume without loss of generality that T C]O, 1]. 
We may also assume that a, 4 0 for each m. Let us denote by Z, the family of the 
2” dyadic intervals ](i — 1)2~-",i2~"] for 1 < i < 2”. Fora point t €]0, 1], we 
denote by J, (t) the unique interval of Z,, that contains f. 


Theorem 16.2.1 (A. Paszkiewicz [81]) Given the sequence (am), and hence the 
set T, the following are equivalent: 


(a) The series (16.9) converges a.s. for every choice of the orthonormal sequence 


(Gn). 
(b) There exists a probability measure 4 on T such that 


1 
sw) ED <0. (16.11) 


n>0 


(c) There exists a number B such that for every probability measure 4 on T, one 
has 


> poe bas hee (16.12) 


n>0 IeT, 
(d) There exists anumber B’ such that for each process (X;)ter which satisfies 
¥s,ieT , EX —Xpy < le —2), (16.13) 
we have 


E sup |X, — X;| < B’. (16.14) 


s,teT 


(e) For each process (X;)ter which satisfies (16.13), limy+oo Xn, exists a.s. where 
tk = nee ns 
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At this stage, this theorem should look completely mysterious. It will take the 
work of several sections to clarify the underlying issues. Let us start by some 
simple observations. The conditions (b) to (e) do not involve orthonormal series, 
but only the set T. This set T is just the set of points of the increasing sequence 
= Ses a. Such a set has the notable property that its closure is the set 
T 2 {t*}, where t* = limy—oo th = Yost a. The condition 16.13 is the special 
case of Condition (16.1) where g(x) = x? and d(s,t) = /|s —f]. Thus, Pisier’s 
bound (16.5) ensures that (d) holds when the integral fies /N(T, d, €)de is 
finite or, equivalently, when )>,,.9) V2-"N(T, d, 2-"/*) < oo. However, the neces- 
sary and sufficient conditions (b) and (c) are somewhat weaker than this condition 
(although this is not obvious yet). In Sect. 16.8, we will provide considerable 
generalizations of the equivalence of (b) and (c) on the one hand and (d) on the 
other hand, but we will first provide specific proofs of this fact in the context of 
Paszkiewicz’s theorem. 


Exercise 16.2.2 Prove that the condition )°,.9 ¥2-"N(T,d?,2-") < oo is 


equivalent to the condition 


> V2 card{I eZ, ; INT £B} <0. (16.15) 


n>0 


Prove that under this condition, (c) and (d) of Theorem 16.2.1 hold. 


16.3 Recovering the Classical Results 


In this section, we recover two classical results from Theorem 16.2.1. This will help 
us get a feeling for the conditions of this theorem. On a less positive note, it will 
also illustrate that working with these conditions is not as easy as what one would 
like it to be. 


Corollary 16.3.1 (Rademacher-Menchov [66, 90]) Jf 


Y > az, (log)? < 00 , (16.16) 


m>1 


then for each choice of the orthonormal sequence (@m), the series ye AnPm 
converges a.S. 


Proof We shall prove that (c) is satisfied. We consider a probability measure jz on 
T, and we aim to bound the left-hand side of (16.12). The plan is for each n to split 
the sum >>, eT, 2~"w(1) into several suitable pieces and to bound each of them 
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using the Cauchy-Schwarz inequality in the following form: For 7 C TZ, then 


So V2 WD = 27? SD) 3 2 Veard I /uUregD , — (16.17) 


leJ leJ 


where we have used that for a disjoint family Z of intervals, }),-7 u(/) 


W(Uyez!). But, first, we must reformulate (16.16). Forn > 1, lett, = Dinca i, 
For k > 0, let up = Lok» SO that 
yo 21 —ud= >> >) ka < LY az (logm)’ <0, 
k>0 k>0 2k <m<22*+1 m>2 
(16.18) 


using that ok < Llogm form > 2k In particular, we have ug41 — ug < C2—7* so 
that if * = >.>, 44, then t* — ug = >, 5, (ur41 — ur) < C27, 

We now fix k and consider 2 < n < 2+! and turn to the task of splitting the 
sum ieee  2—"w(L) into suitable pieces. Consider J € Z, with (7) > 0, so that 
IAT #9. We claim that at least one of the following four cases must occur: 


¢ J contains a point u, fork — 1 < p < 2k or contains if 
¢ I C0, uz_1] 

¢ I Clue, ue+1] for some k — 1 < € < 2k 

© I Clurx, t*]. 


To see this, we simply observe that if the interval J does not contain either the 
point ¢* or one of the points u,, fork — 1 < p < 2k, then it must be contained in one 
the subintervals of [0, 1] created when removing these points from [0, 1]. However, 
since T C [0, t*], it cannot be contained in the interval |t*, 1], so that it is contained 
in one of the other intervals left, which is exactly what the last three bullets state. 

Consequently, for 2 < n < 2'+!, we may write 


Y> /2"w) =14+N4+l+ So vee, (16.19) 


TET, k—1<£<2k 


where 


e J is the sum over J C]O,uz_ 1]. This sum has at most Qe non-zero terms, 
because when T 1/1 + @, J must contain at least one point f,, with m < gut 
Then (16.17) implies that the sum is < 2-”/222"" < 2-2", 

¢ II is the sum over the intervals / that contain a point u, fork — 1 < p < 2k 
or that contain the point t*. This sum has at most k + 2 terms and is bounded as 
above. 
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¢ III is the sum over the family of intervals J contained in ]u,, t*]. Here we use 
that, if uw < v, and 7 denotes the family of intervals J € Z,, 7 Clu, v], then 


So V2" WD < Vo w/uu, v)) . (16.20) 


TeT 


This follows from (16.17) because card J < 2”(v — u) since the intervals J of 
T, have length 2~”. Thus, III < /f* — ux < C27**. 
¢ V(£) is the sum over the intervals J Clue, ue41], which, as witnessed by (16.20), 


is bounded by /ueqy — Ue “Cucq1, uc) 


Summation of the inequalities (16.19) over n with 2k <n < 2*+! and then over 
k yields that for a certain number C’ 


VS vee +) 2 YS ves 


n=0 led, k>1 k—-1<<2k 


We want to prove that this quantity is finite. First, 


2k vie)< > Vee) 224) Vie), 
De. Ds ~v® Dd 


k>1  k-1<€<2k £>0 k-1<e e>0 
and then (recalling that V(€) < /ues1 — ue. u(ue+1, uel)) 


Si 2°V(e) < >) 2° Juri — wey wCuesi, uel) < 00 


l>0 £>0 


using the Cauchy-Schwarz inequality and (16.18). oO 


Corollary 16.3.2 (Tandori [135]) /f for each choice of the orthonormal sequence 
(Ym) the series ae AnQm converges a.s., then 


>> a}, log |am|)” < 00 . (16.21) 


m>1 


Before we start the preparation for the proof, let us introduce some notation that 
will be used throughout the book. We will write in a standard way expressions such 
as Dr <n UG OF Le , 4i- However, when we want in the same line to describe both the 
summation and the set over which the summation occurs, we will write )“{aj;i € 
I}, with the expression i € J replaced if necessary by the description of the set, 
as is done in (16.22). We will use the convention not only for sums but also for 
inf, sup, max, and min (where it is more standard). Using this convention for k > 0, 
we write 


= Y lao 2a 25), (16.22) 
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which as explained means by = >), <u; az, where 


2u 29>}, (16.23) 


Lemma 16.3.3 Forn > 1, let t, = nes a. Fix k with by > 0. Consider the 


probability measure 1% on T given by x ({tn}) = a? /bx ifn € Ux and ux ({tn}) = 0 
ifn ¢ Ux. Consider n with 2'—! <n < 2*. Then 


So 2 (D> Ze (16.24) 


TET, 


Proof To prove (16.24), it suffices to prove that if 2'~! < n < 2*, then for each 
Tet, 


g-nt+l 
Mk) S , (16.25) 
De 


because then /27~"Uz(1) > Uwe) J bx /2, from which (16.24) follows by summa- 
tion over I € Z,,. By definition of juz, it suffices the prove that {a2 3m € Uk, tm € 


m? 
I} < 2-"*!, Denoting J =Ja, b], consider the interval J’ =]a — 27 b], so that 
the length of J’ is 2-" + 2-2" < 2-"+!. When tm € J, the interval ]fm—1, tm] is 
entirely contained in /’, and its length is exactly a. As these intervals are disjoint 
as m varies, this shows that yd {a2 ;m € Ug, tm € I} is at most the length of 7’. oO 


m? 


Proof of Corollary 16.3.2 Since log |am| < L2* form € Ux and by = Yneu, Gn> 
it suffices to prove that )7,..9 27*be = yey 27* bk < 00. 

By Theorem 16.2.1, we know that (16.12) holds. We will apply this condition to a 
suitable probability measure jz, which we construct now. Consider numbers (a) ce 7 
with a, > 0 and }°, a = 1. Consider the probability measure = )o <7 OkMk; 
where jg was described in Lemma 16.3.3. Consider n with Wl <n < 2. 
Using (16.24) in the second inequality, we obtain 


Y VEO = VaR Va = (16.26) 


TeTy, TeTLy 


We then sum (16.26) over oe <n < 2* and then over k to obtain, using 
also (16.12) in the first inequality, 


Bedi) V2" = wee 


n>0 le, keJ 
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and since the sequence a, is arbitrary with }*a, = 1, optimization over this 
sequence yields )7,- ; 27*by < 8B?. Oo 


The necessary condition (16.21) is by no means sufficient for the convergence of 


each series oe 1 4mm. This is obvious from Theorem 16.7.1. 


Exercise 16.3.4 Prove that the conditions (16.21) and (16.16) are equivalent when 
the sequence (a) is non-increasing. Consequently, (16.21) is a necessary and 
sufficient condition so that one can find a permutation a such that the series 
on Ax(myp(m) converges a.s. for each orthonormal sequence (¢). 


Exercise 16.3.5 For a finite subset T of ]0, 1], consider the following quantity 
M(T). If cardT = 1, we set M(T) = 0. Otherwise, let n(7) be the largest integer 
such that there exists I € Z,(7) for which T C J. Call this interval I7. Define then 


1 


M(T) = inf sup > Jia) ’ 


eT n>n(T) 


where the infimum is computed over all choices of probability measures on T. Now, 
Ir is the union of two intervals /; and Jy of Z,;7)41. Explain how to compute M(T) 
when you know M(T 1 J;) for j = 1, 2. In this manner, the quantity M(T) can be 
“computed recursively”’. 


16.4 Approach to Paszkiewicz’s Theorem: Bednorz’s 
Theorem 


We now describe our approach to Theorem 16.2.1. The following is an obvious 
consequence of orthonormality: 


Lemma 16.4.1 Fort = >,,<, a2, € T, let us define 


m<n“m 


X= Yo amon - (16.27) 
m<n 
Then 
Veet, EX-=%4yY Spats (16.28) 


This makes it obvious that (e) implies (a) in Theorem 16.2.1. It also motivates 
the following: 


Definition 16.4.2 If T is a subset of [0,1], we say that the process (X;);er is 
orthonormal! if it satisfies (16.28) and if moreover EX; = 0 for each t. 


The main ingredient in the proof of Theorem 16.2.1 is the following result: 
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Theorem 16.4.3 (W. Bednorz [13]) Consider a finite subset T of (0, 1], and define 


F*(T) = supEsup X; , (16.29) 
teT 


where the first supremum is taken over all orthonormal processes indexed by T. 
Then for each probability measure js on T, we have 


Yodo V2) < LU + F*(D)). (16.30) 


n=0 TeTy 


Our first task is to make the link between Theorems 16.2.1 and 16.4.3. 


Lemma 16.4.4 [f the process (X1);er is orthonormal, then 
h<n<p<t4 ET > EX, — Xp,)(X%, — X14) =0.- (16.31) 
Proof Consider t; < t2 < t3 € T. Then 
3 —t = E(X, — ay 
= E(X1, — Xp)? + E(Xy — Xn) + 2E(Xy — Xn) (Xy — Xn) 
= 3 — t2 ae t2— fh - 2E(X;, ~ Xi )(Xty = X1,) ’ 
so that we have proved 
ty <h <2 € T> E(X1, — X1)(Xt — Xr) = 0. (16.32) 
We use (16.32) to write 
0= E(Xi, _ Xt) (Xp _ Xt) 
= E(X;, ~ Xn )(Xp ~ Xt) = E(X1, = Xn )(Xp = Xt) 
= E(Xi, ™ Xp )(Xp a Xt) ’ 


using again (16.32) in the third inequality. oO 


We will also need a classical result of Tandori [136]. This lemma really brings 
out the strength of the statement “for every orthonormal sequence. ..”. 


Lemma 16.4.5 Assume that the sequence (an) has the property that for every 
orthonormal sequence (gy), the series a 1 Im@m converges a.s. Then there exists 
anumber A such that for each orthonormal sequence (gy), we have 


E sup ( ann) <A. (16.33) 


n21 l<m<n 
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Proof For 1 < p <q, let us define 


Vip,gv=E sup ( > anon) (16.34) 


PSN<q,(Gn) p<m<n 


where the supremum is also over all orthonormal sequences (¢,). Let us assume 
for contradiction that (16.33) fails, i.e., that limygo.V(1,q¢) = oo. The inequality 
(a+b)? < 2a? + 2b shows that 2°07 .<m<n 4mPm)” >= (Sr<m<n 4mm)? — 
A temdp AmQYm)?. Consequently, for each p, we have limy+oo V(p, q)’ = Oo, 
and therefore, we can find an increasing sequence (p;) such that V(px, pe+i) > 1 
for each k. By definition of V(p, q), we can then find an orthonormal sequence 
(@m,k)m>1 for which 


W(k):= max | a AmPm,k 


Kk SNS Pkt 
P Pk+ pesme<n 


satisfies EW(k)? > 1. Let us define the function 


W(k)? 


Ewe ’ 


so that EQ, = 1. For pp < m < pryi, we have |amQm,x| < 2W(k) since amQm.k = 
oeeee AsQs.k — Sptyseenai As s,k. Consequently, @m,~ = 0 when 6% = 0. For 
Pk <m < pr+i, we may then define gf, = @m,x// Ox. Since 


J e066 = f om.cem ad? , 


the functions (%),) p,<n<p,,; Still form an orthonormal sequence for the probability 
measure P’ = 6;P and satisfy 


_ Wik) 


= 2)1/2 
max y Am, | = —= = (EW(k)*) * > 1. (16.35) 
PRSNS Pe+1 ion en JV OK 
We can moreover assume that Ey}, = 0. To see this, we replace the sequence 


(CD weerne oped by the sequence (€9,,,) pe<m<pi4t where ¢€ is a Bernoulli r.v. 
independent of all the r.v.s g’,. 

We construct a sequence (W)m>1 using independent blocks: for each k > 
1, the sequence (Wm) pe<m<pz4 18 a copy of the sequence Ce eee and 
these sequences are globally independent as k varies. The sequence (Wim )m>pc1) 
is orthonormal. Indeed, if py < m,m' < pry, for some k, then EW Wy = 
Eg, Pinr = 0, while if this is not the case, y%, and yy, are independent and 
of expectation 0. We complete in any way we like in an orthonormal sequence 
(Wmn)m>1 (€-., We Set Wy = &m for 1 < m < p(1) where (€,,) is an independent 
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Bernoulli sequence). According to (16.35), the series nest AnWm diverges a.s. 
This contradiction shows that (16.33) must hold and concludes the proof. oO 


Corollary 16.4.6 Under the hypothesis of Lemma 16.4.5 for each orthonormal 


process (X})teT, one has 


E sup |X; — X;|<2VA. (16.36) 


s,teT 


Proof We recall that T = {f,, f2,...} where ty = en a It follows from 
Lemma 16.4.4 that the sequence (@m)m>2 given by 


Gm = Oj, (Xt = X tin) 


is an orthonormal sequence. If p < q, then Xi, - Xt, = eee Am@m, So that 


sup |X;5 — X;| < sup | » Am Pm| < 2 sup | > am Ym| 


5,tETk p<ms<q 2<m<n 
and (16.33) implies (16.36). Oo 
Assuming Theorem 16.4.3, we are now ready to prove the “hard part” of 
Theorem 16.2.1. 


Theorem 16.4.7 [f the series (16.9) converges a.s. for each choice of the orthonor- 
mal sequence (@,) or if (d) of Theorem 16.2.1 holds, then there exists a number B 
such that for each probability measure yt on T, one has 


> > 2 tH 8 (16.37) 


n>0 leLy, 


Proof Assuming first that the series (16.9) converges a.s. for each choice of 
the orthonormal sequence (g,,), it follows from (16.36) that the quantity F*(7T) 
of (16.29) satisfies F*(T) < 2VA. 

Consequently, Theorem 16.4.3 implies that for each probability measure jz on a 
finite subset of 7, one has 


>> V2" < B:= LU +-VA). 


n>0 Ie, 


It then should be obvious that this implies the same inequality for each probability 
measure pz on T. 
Assuming now that (d) holds, one has F*(T) < B’ and the proof is the same. O 


We shall also use the following, which is a variation on Lemma 3.3.3: 
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Lemma 16.4.8 Consider a finite set T C [0,1], and assume that for each 
probability measure 4 on T, one has 


a Y> V2" uD SB. (16.38) 


n>0 leZy, 


Then there is a probability measure js on T for which? 


sup 
teT 


1 
————. < 2B 16.39 
V2" ent) 


n>0 


Proof The key argument is the Hahn-Banach theorem, in the form of Lemma 3.3.2. 
Let us denote by M(T) the set of probability measures on T, and for uw € M(T), 
let us consider the function 


hud) = 
TO kt“ Fn) 


n>0 


Since I,(t) = I fort € I € Z,, we have f;(uUn(t))) dwt) = VC) so that, 
using (16.38) in the inequality, 


/ fultdn(t) =) >) V2"  B- (16.40) 


n>0 IeELy 


Since the function x +> 1/,/x is convex, the map 4 +> f,, is convex. Consequently, 
the class C of functions f on T that satisfy 


qweM(T); VteT, fu <= fO 


is convex. For each probability measure jz on T, (16.40) shows that there exists f 
inC (namely, f = f,,!) with ‘A fd < B. Consequently, by Lemma 3.3.2 (used for 
e = B), there exists f € C such that f < 2B. oO 


Proposition 16.4.9 If condition (c) of Theorem 16.2.1 holds, so does condition (b). 


Proof Let t, = Dens a, f= limpet, = > ei a>, so that T* = T U{t*} is 
compact. Consider T; = {t,,n < k}. Combining (c) with Lemma 16.4.8, we obtain 
a probability measure jz, on T, for which 


1 
sup ) ——_____ < 2B. (16.41) 
teh X JP Kn @)) 


? If one works a tad harder, one may get B rather than 2B in the next inequality. 


506 16 Convergence of Orthogonal Series: Majorizing Measures 


From here, the proof is basically a compactness argument. Taking a subsequence 
if necessary, we may assume that the sequence (x) converges weakly as k > co 
to a probability measure jz’ on T*. Then for each compact subset K of T, we have 
w(K) = lim inf, wx%(K). This applies in particular to the sets 1M T* for I € Ty, 
which are compact. It then follows from (16.41) that 


sup 
teT 


1 
) ——__——. < JB. 16.42 
JWT) — oa 


n>0 


The problem is that it might happen that pz’({t*}) > 0, and then yu’ is not supported 
by T. We modify ju’ to take care of this problem. We consider a probability measure 
we of the form w = p’/2 + py where j1; is a positive measure on T of mass 1/2 
which for each n gives mass > Q-nl2 /L to the interval J € Z, which contains ¢*. 
This is possible because this interval is of the type ]u, v] so that it meets T. 

Then, for 7 € Z,, we have w(/) > p’U)/2 if t* ¢ I, while if t* € J, then 
wd) = 22 I), It is then immediate to check that ju satisfies (16.11). oO 


16.5 Chaining, I 


We need one more ingredient to complete the proof of Theorem 16.2.1 (still 
assuming Theorem 16.4.3). We need to control the supremum of a stochastic process 
under condition (16.13). Theorem 16.1.3 does not suffice for this purpose. In this 
section, we develop a more efficient chaining scheme. We consider a general finite 
metric space (7, d), and we try to bound a process (X;);e7 which satisfies 


Vs,teT , E(X,—X;)* < d(s,t)*. (16.43) 


When T is a subset of the unit interval and when d(s, t) = ./Js — J, this covers the 
case of the processes satisfying (16.13). 

Consider a sequence (T,)n>09 of subsets of T. We assume that card 79 = 1, and 
we denote by fp the unique element of 7p. We assume that for each n > 1, we are 
given a map 0, : T, — T,_1. As in the proof of Theorem 16.1.3, this map will 
help us to build a proper chaining. Since we assume that T is finite, it is not much 
of a restriction to assume that 7,, = T for a certain (large) integer m. We define 
I(t) = t for each ¢t and recursively 2,,_ (t) = 0, (71, (t)). First, as usual, we write 


Xi — Xl < Yo (Xap) — Xml - (16.44) 


l<n<m 
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Using the inequality xy < x* + y’, it is rather natural to write that, for s € T,, and 
introducing a parameter c,(s), 


7 d(s, On(s)) Xs = Xonis) 
|X X6,(s)| < oe +d(s, anisdients)( d(s, 6n(s)) ) 


Let us assume for simplicity that for numbers €, > 0, we have 
Vs ET, , d(s,@n(s)) Se. (16.45) 


Then 


iy 
[Xs — X6,(s)| < + co) 


mre ) d(s, On(s)) 


Using this for s = z,,(t), and recalling that 2,_1(t) = 0,(77n(t)), we obtain (using 
a crude bound on the last term to make it independent of r) 


X6,(s) 
\Xx,_1(t) — Xm) < El So Xu EnCn(S (Au 


We then deduce from (16.44) 


Xon(s) 
|X; = Xi | < a CAO ——— + > EnCn(S) (<—tuw). (16.46) 


l<n<m l<n<m seT,, 


Let us now set 


En 
S= sup tab) 7 (16.47) 


teT l<n<m 


Sy arnt; (16.48) 


l<n<m seTy, 


Then (16.46) yields 


ag 
sup|X:— Xl <S+ D> DO cnents)( Ai Kui) , (16.49) 
teT 


l<n<m séTy d(s, On(8)) 


Taking expectation and using (16.43), we obtain the following important relation: 
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Lemma 16.5.1 Recalling (16.47) and (16.48), we have 


Esup |X; — Xy|<S+S*. (16.50) 
teT 


Corollary 16.5.2 We have 


E sup |X; — Xgl SL) enVcard J; . (16.51) 
teT 


n>1 


Proof Choose cy (t) = 1/./card T, fort € Ty. oO 


Exercise 16.5.3. Prove that (16.51) implies Pisier’s bound (16.5) in the case g(x) = 
2 
a 


We recall the notation J, (t) of Theorem 16.2.1. 


Corollary 16.5.4 Consider a countable subset T of [0, 1]. Assume that for a certain 
integer no > 0 and a certain Ip € Ing, we have T C Ip. Consider a probability 
measure tt on T such that 


1 
A := sup —————_ < & 
rer 2 VP EUn@) 


Then for each process (X;)ter that satisfies (16.13), we have 


E sup |X, — X;| < LA. (16.52) 


s,teT 


Proof Since the process satisfies (16.13), it satisfies (16.43) for d(s, t) = /|s — f]. 
The plan is to use (16.50), and we construct the relevant chaining. We construct 
inductively forn > no aset T, C T such that card 7, 17 = 1 whenever J € TZ, and 
INT 4 Gand such that moreover T,_; C T,. When s is the unique point of T, NJ, 
let us then set 


cn(s) = Ve). 


Let us moreover define the map 0, : T, — Ty, in the canonical manner. That 
is, if s is the unique point of 7, I where J € Zy, there are a unique J’ € TZ, 
with J Cc I’ and a unique point s’ in T,_1 M I’. We then set 6,(s) = s’. We have 
Is — On(s)| = |s — s'| < 27°), so that d(s, On(s)) = JIs—On)| < & c= 
2-@-D/? ie., (16.45) holds for this value of €,. 
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Considering an arbitrary integer m, we now use the bound (16.49) for T,, rather 
than 7. Then, for t € 7,,, we have 


Q-(n-1)/2 


> me eee > —— $94. 
Cn (Tn (t)) Ven (t)) 


l<n<m l<n<m 


so that S < 2A. Also, integrating the inequality 


1 
VreT, ————————. < A 
2. Jun) 


with respect to jz, we obtain (using as always that /(t) = J fort € J) 


ye 27 /e SA. 


n=no0 TET 
This means that S* < LA. Consequently, the bound (16.50) implies 


E sup |X;—X;|< LA, 


S,tETin 


and since m is arbitrary, this proves (16.52). Oo 


Proof of Theorem 16.2.1 We proved that (c) implies (b) in Proposition 16.4.9. We 
proved that (b) implies (d) in Corollary 16.5.4. We proved that (d) implies (c) in 
Theorem 16.4.7. Thus, (b), (c), and (d) are equivalent. 

We proved that (e) implies (a) in Lemma 16.4.1. We proved that (a) implies (c) 
in Theorem 16.4.7. 

We now prove that (b) implies (e), completing the proof of Theorem 16.2.1. Let 
us consider the point t* = ae aj, = limg-soo t, the supremum of T. Let us 
consider an integer ng and the unique Io € Zn) with t* € Jp. Consider the set 
T’ =TNI|p, so that t, € T’ for k large enough. Then 


sup 


1 1 
ee Atc= ——_ 
ape Via) ~ sup) ama 


n>0 


Consequently, the probability measure yz’ on T’ given for B C T’ by p(B) = 
M(BOT')/u(T’) = (BA Ip)/(1o) satisfies 


1 
————. « A* (Ip) . 
up Pee) = 
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The bound (16.52) used for T’ and ju’ then implies that each process (X;)+e7 which 
satisfies (16.13) also satisfies 


E sup |Xs5 — X;| < LA*/ uo) . 


5,teT’ 


Now for no large enough 1 (Jo) is arbitrarily small since Neso(TNI]t* — €, t*]) = @. 
Consequently, 


lim E sup |X; — X,|=0. 


NOOO k psn 


This concludes the proof. oO 


16.6 Proof of Bednorz’s Theorem 


The main step in the proof of Bednorz’s theorem is, given a finite subset T of ]0, 1], 
to relate the “size” of T with the size of the four sets 7; = T M 1; where for 
1 < j <4, /; is the interval ](j — 1)/4, 7/4]. It is performed in Proposition 16.6.3. 
The reason why we use 4-adic partitions is that we are certain that “7, is far apart 
from 73”(etc.), whereas one cannot say the same about, say, 7; and 7> since 7| 
might be located to the very right of J; and 72 might be located to the very left of 
In. (This is why dyadic partitions would not work.) 


Definition 16.6.1 Consider an interval J =]c, d] C [0,1] and J = [c, d]. We say 
that the process (X;),-7 is normalized if EX, =0, X. = X¢g = Oand 


Vettes, 6 <¢, E& =X St-—s-—W-o 57. (16.53) 


The reason behind the formula in the right-hand side of (16.53) will be explained 
soon. We fix the finite set T C]O, 1] once and for all. For an interval J =]c, d] C 
[0, 1], we consider the quantity 


F(J) =supE sup X;, (16.54) 
teTNJI 


where the first supremum is taken over all normalized processes indexed by J = 
[c,d]. Although X, = 0 is defined, in (16.54), the supremum is only over TN J, 
not over TM J. We define F(J) =O when TO J =9. 

The quantity F(J) will be our “measure of size of J”. We first relate it to the 
quantity F*(T) of (16.29). 


Lemma 16.6.2. We have F(|0, 1]) < F*(T). 
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Proof Consider a normalized process (X;)+<[0,1]. Consider a centered rv. Z, 
independent of this process, and such that EZ? = (d — c)~!. Then the process 
Y, = X; + tZ is orthonormal. Using the definition of F* in the first inequality and 
using Jensen’s inequality (taking the expectation in Z inside the supremum) in the 
second inequality yields 


F*(T) => EsupY,; > Esup X; , 


teT teT 
and since the normalized process (X;) is arbitrary, this proves that F(JO,1]) < 
F*(T). oO 
We state now the main step in the proof of Bednorz’s theorem. 
Proposition 16.6.3 Consider I =]0, 1], and for j = 1,2,3,4, consider I; =|(j — 
1)/4, j/4] and numbers a; = 0 such that are aj; = 1. Then 
POS > Grp. (16.55) 
l<j<4 
Moreover, if for each 1 < j < 4 we have a; = 1/400 andT 11; 4 Y, then 
1 

FU)>= ;FUj))+—. 16.56 
Oe >) Ja, (16.56) 


I<j<4 


The inequality (16.56) is a kind of growth condition. The constants 80 and 
400 are just convenient choices and do not carry special meaning.’ There is no 
magic: to prove this result, given normalized processes on the intervals J;, we have 
to construct a normalized process witnessing (16.55), and this will require hard 
work. This work will be performed at the end of the section, and we first turn to 
the comparatively easier task of deducing Bednorz’s theorem from this result. We 
first state a kind of rescaling of Proposition 16.6.3, where, instead of starting with 
I =)0, 1], we start with any dyadic interval. 


Corollary 16.6.4 Consider I € Zn and the four intervals I; of Im42 for j = 
= 1. 


1,2, 3,4 which it contains. Consider numbers a; > O such that ies Oj 
Then 
PU) 2 ->° agra); (16.57) 


Isj<4 


3 The condition aj; = 1/400 simply ensures that a; stays away from 0. 
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Moreover, if for each 1 < j < 4 we have a; = 1/400 and T 11; 4 Y, then 


F)> > Vay) + -— . (16.58) 


1<js4 


Proof Denoting by Fy (J) the quantity (16.54) to indicate the dependence in 7, 
it suffices to prove that fora > 0 and b € R, we have, with obvious notation, 
Far+p(aJ +b) = .faFr(J). This follows from the fact that if the process 
(X;)teas-+b 1S normalized on the interval a J +), then the process Y; = GPX Gia 
is normalized on the interval J. oO 


We recall that Z, denotes the family of 2” dyadic intervals of length 2~”. 
Theorem 16.4.3 is a consequence of Lemma 16.6.2 and the following: 


Proposition 16.6.5 Consider a finite set T C [0,1]. Then, given a probability 
measure tt on T, we have 


dE 2"? Vu < Ld + F0, 1). (16.59) 


n>0 eZ, 


As a preparation for the proof of this result, we fix a probability measure ju on 
T, and for n > O, we define i, as the collection of intervals J € Z>, that have the 
following property: 


V € Dy, CTS wl’) = w(1)/400. (16.60) 


We then define 


Mpa” > fe) (16.61) 


TeTS, 
Lemma 16.6.6 We have 


= M,, < 80F (]0, 1]). (16.62) 


n>0 


Proof We recall that F(J) = 0 when JT = @. We prove that for each n > 0, we 
have 


1 
YS VADED = sont Yo VR. (16.63) 
TEL oy, TeéLon+2 
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Given I € TZay, let us denote by J), hy, 13, and J, the intervals of Z2,42 which are 
contained in J. Then, using (16.57) for aj = wU;)/MU), we obtain 


VEDFI > DY JedpPdy= Yi Ved)FU). (16.64) 


j<4 I'CLL'EL an 42 


If moreover J € Bae we can now use (16.58) with the same choice of a; and 
m = 2n (so that 2~” = 2~’"/*) to obtain the better inequality 


1 
VEC) FU) = VEU) F(') + 2" VW) (16.65) 
80 


UCI, ELan42 


Summation of the inequalities (16.64) and (16.65) over J € Zs, completes the 
proof of (16.63). Rewriting (16.63) as M, < 80(S, — Sn41) where S, = 


, €Io, V (1) F(Z) and summing over n completes the proof. oO 
Lemma 16.6.7 Consider numbers aj > O for j = 1,2,3,4 such that 
Vi<j<a%j = 1. Then 
min aj 5 va 22. (16.66) 
ieee = 700” 10° : 


Proof Assume, for example, that a; < 1/400. Concavity of the function x  /x 


shows that ./a2 + ./a3 + ./ag < /3(a@2 +03 +44) < Wey so that 
1 1,1 9 
may fag z5(—+V3) <=. o 


The following will complete the proof of Proposition 16.6.5: 
Lemma 16.6.8 We have 


Yo 2a < 28H) =2- Yo 2" Va. eeee 


n=0 TeTy n=0 TET on 


and 


S(u) <10+10)°M,. (16.68) 


n>0 


Proof An interval J € Z, is the union of two intervals 7’ and I” of Z,41 and 
wd) = wd’) + wd”). The inequality /a + /b < J/2/a+b implies that 
2-@+ D2 /n 7) + /ud”)) < 27-"/*./u(1). Consequently, the sequence cy := 
Lice 2-"/?../4(1) is non-increasing. Thus, bere Con < bas, Cn < 2 ae C2n 
which is (16.67). 
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For I € Zn, let 


w(1) =27-""! > Jus) . 


Jcl, JETLon42 


The equality 


Su) =14+ 5 YO w) (16.69) 


n>0 TET op, 


holds because all the terms in the summation that defines S(jz) occur in one of the 
terms w(/), except the term for n = 0, which is equal to | since yz is a probability. 
Given n > 0, 


Yo w= YO w+ YS wd). (16.70) 


TET on TeT3, 1¢Ts, 
Consider I ¢ via and denote by J;, 7 = 1,2,3,4, the four intervals of Zon+42 


contained in J. Define aj = w(/;)/u(Z). By definition of Zz, the smallest of these 
four numbers is < 1/400 so that Lemma 16.6.7 implies that 


w(1) < avn (16.71) 


Summation of the relations (16.71) yields 


Y Ewm say Y sre s a5w. 


n=01¢Ty, n=0 1¢T, 


Combining with (16.69) and (16.70) and recalling (16.61), we thus obtain 


9 = 9 
Siu) <1 + SW + DO DT 2" Ve s 1+ SW + DU Mn . 
n20 IeTy, n=0 

This completes the proof. Oo 

It remains only to prove Proposition 16.6.3. This proof occupies the rest of 
the present section. To prove the proposition, starting with normalized processes* 
G7 de ag for 1 < j < 4, we shall construct a suitable normalized process on 
I = (0, 1]. Let us set 


g(x) =x — 4x? , (16.72) 


4 We remind the reader that in particular EY = 0. 


16.6 Proof of Bednorz’s Theorem 515 


so that saying that the process Y,’ is normalized means exactly that for s,t € Tj, 
Ss < t, we have 


EY! — ¥/) =e -s). 
We start with some preparations. For t € J and 1 < j < 4, let us define 
1 = max(min(t, 7/4), (j — 1/4) €T; =1G — 1/4, 7/41. 
In words, tj = (j — D/ift < j/4,t; =tif(j — 1/4 <t < j/4andy = j/4if 
te i/4. 


For 0 < s <t < 1, the interval ]s, ¢] is the disjoint union of the intervals |s/, t/] 
for 1 < j < 4. In particular, we have 


Ns = Yo Asi aay « (16.73) 
js4 


Consider the probability space [0, 1] provided with Lebesgue’s measure. (Thus, 
E refers simply to integration with respect to this measure.) The archetypical 
example of a normalized process on [0, 1] is given by the formula 


W, = lost. (16.74) 


Our first goal is to play with this process to discover the useful algebraic iden- 
tity (16.76). Consider the algebra S of subsets of [0, 1] generated by the intervals 
I; for 1 < j < 4, and denote by Eg conditional expectation with respect to this 
algebra. We define 


V, = EsW, = Es(lpor —1) . (16.75) 
Lemma 16.6.9 We have the identity 


t—s-—(t-sY = Yo gsi -t/) +E -V)°. (16.76) 
1<j<4 


Proof We define V/ = W; — Es W;, so that W; = V; + V; and V; is S-measurable, 
while Es(V;) = 0. Given two function f, f’ with Es f = 0 and f’ S-measurable, 
then Eff’ = EEs ff’ = Ef’Esf = 0, so that E(f + f’)? = Ef? + Ef”. 


Consequently, for s < f, 


t—s—(t—s)* = E(W, — W,)* = E(Vi - V/)? + E(V, — V;)* . (16.77) 
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Keeping in mind that A(/;) = 1/4, we obtain from (16.73) that 


Esljs.y=4 > (t/ —s/)1y, . (16.78) 
is4 


Since El), = 1/4, we obtain 


Ess)? =4 > (7 -s/). (16.79) 
j<4 


For s < t, we have 
! 2 2 2 
E(V, — V,)° = E(as,4) — Eslpse)° = Elfs,7) — E(Es 15,1) 


Using (16.79) and since E(As, 4} =t —s = <4 t/ — s/, we obtain 
EV-Vvyr= So e(ti si). (16.80) 
I<j<4 


We then conclude from (16.77). oO 


We go back to our main construction. It involves an auxiliary process (Z;)rer 
andar.v.t € {1, 2, 3, 4}. Throughout the proof, we assume the following: 


The processes y/ are independent of each other and of the r.v.s Z; and Tt , 


(16.81) 
Vjs4; Pe=f=a;, (16.82) 
EZ, =0; E(Z,— Z,;)? = E(v. — Vi)" . (16.83) 


We do not assume that Z; and t are independent. When a; = 0, fort € J, let us 
define U/ — Yi. Otherwise, we define 


1 


j j 
U; = yay ie ; (16.84) 
and we set 
S= >) uj. (16.85) 
1<j<4 


We then transform the process (S;) into a normalized process by adding (Z;) as 
follows: 
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Lemma 16.6.10 The process 
X,=S;+Z, (16.86) 


is normalized. 


Proof Assume s < t. Using the independence of t and y/ in the second equality, 
and that the process Y/ is normalized in the third one, we obtain 


Eu —u/)? = + Ean (V5 — yj)? = Ei —yi)? = g(ti—s!). (16.87) 
aj AY 


This formula remains true even when a; = 0, since then u; =Y - It follows 
from (16.81) and the fact that Y/ and Y/ " are independent for j 4 j’ that then 
Eu/U/ = 0, so that 


E(S;— Sy? = Di EUs —U/) = YF ot —s/). (16.88) 


I<j<4 I<j<4 
It follows from (16.81) that ES,Z; = 0, so that 
E(X; — X;)? = E(S; — S;)° + E(Zs — Z;)" , 
and the result follows from (16.83), (16.88), and (16.76). oO 


Lemma 16.6.11 Assume that TN 1; 4 @ for each j < 4. Then 


EsupxX,> )> (Jae sup Yj +, inf, Elen Zs) (16.89) 
teT 1<j<4 teTO1; 


Proof First, we observe that, using that TM J; A @ in the last inequality, 


Esup X; = oD El;,—)) ee 
teT 


1<j<4 
= > E sup 1,,= Xr 
1<j<4 teT 
> E sup 17=);X; . (16.90) 
1<j<4 teTOl; 
If a; = O, we use that trivially E supyerni; WrajjXr = infreTn1; El(,=)}Z; 


because E1,—/;S; = 0. Let us fix j < 4 with a; 4 0 and denote by E/ conditional 
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expectation given the r.v.s Y, J . Then Jensen’s inequality implies 


E sup I=, X,>E sup E/1g=X, . (16.91) 
teTO1; , teTN1; , 


Let us fix ¢ € Jj, so that then ti = t. Since 1a 1tr=j, = 0 for j’ F j, we have 
by definition of X; 


1 ‘ 
Ve= iy X1 = 1 rajy¥/ + pra jyZ 
Jf Oj . . 


Using the independence of t and y/ in the first equality, and the independence of 
Y/ from t and Z in the second equality, we get 


Eta X1 = JOY) + Epa Z: = JOY! + ElrajyZ: - (16.92) 
To conclude, we simply use that sup, (y; + z:) => sup, y; + inf; z;, and thus 


E sup E/1 p=) X1 > /ajE sup ¥/ + inf Elj—j)Z, . gO 
teTN1; © teTOl; eT; 


Lemma 16.6.12 Even when T 11; = © for some j < 4, if the process (Z;) is 
independent of t, we have 


E sup X; > \ Jae sup y/ i (16.93) 
teT jes teTO1; 


where J = {j <4; 1; 4 G}. 
Proof As in (16.90), we obtain 


Esup X; = = E sup 1,,=);X; = Se sup 1,,=;;X1 , 
teT 1<j<4 teT jed teTOl; 

where we use that Esup,e7 1j;=j)X; = Esupjerny; Wr=jjXr if TOT; FO 

and E sup,e7 Itr=j}X; = 0 because El,;—;;X;=0 for each t. Since t and Z; are 

independent, and since EZ; = 0, then (16.92) implies Eas Xr = faj y? and 

the conclusion from (16.91). oO 


We need one more ingredient, which is a consequence of the definition V; = 
Es(1jo,1] — 1). 


16.6 Proof of Bednorz’s Theorem 519 


Lemma 16.6.13 We have 


1 1 
inf E(V;1,,) > -—— ; inf E(V;1,,)>— 
Ee 76 BS 
inf E(V;1;,) > : inf E(V;1;,) > 0 (16.94) 
in — ; in : ; 
tel; en 16 : tel, ple 
Proof Indeed, for t € I; and x € I4, we have V;(x) = —t > —1/4; for x € 1, and 
t € Ih, we have V;(x) = 1 —t > 1/2; etc. oO 


Proof of Proposition 16.6.3 To prove (16.55), we simply choose Z; independent of 
T (e.g., a copy of the process (V;) which is independent of all the other processes 
considered), and we use (16.93) since by definition FJ) = 0 when I ONT = @. 
It remains only to prove (16.56). We shall use (16.89) with an appropriate choice 
of the process (Z;) (which will no longer be independent of rt). This appropriate 
choice will make the quantity Diejed infrernI; El(,=\Z; large. 

For a subset A of [0, 1], we denote by A/100 the set {x/100; x € A}. Thus, 
I/100 =]0, 1/100] is the union of the four intervals J; /100, each of length 1/400. 
For each j < 4, we have P(t = j) = a; > 1/400. Without loss of generality, 
we may then assume that the underlying probability space is [0, 1] provided with 
Lebesgue’s measure and that for j < 4, 


(1/100) NM {t = j} = Incjy/100 , (16.95) 


where n(1) = 4,n(2) = 1,n(3) = 2, and n(4) = 3. This will greatly simplify 
notation. Let us then define Z;(x) = 0 for x > 1/100, and for x < 1/100, let 
us define Z;(x) = 10V;(100x), where V; is defined in (16.75). Using change of 
variable, we see that (16.83) holds. 

The fundamental relation is, recalling that 1; =](j — 1)/4, j/41, 


1 
El, j\Z; = To nbn Ve 5 


The proof is straightforward by a change of variable: 


Elj=\Z; = 10 | V,(100x)dx = 10 f V,(100x)dx 
{t=j}N]0, 1/100] In(j) /100 
1 1 
ee V;(x)dx = —E1y;.V;. 16.96 
10 Ji, 1x)dx = TEL Vi (16.96) 


It then follows from Lemma 16.6.13 that 
F 1 . 1 1 1 1 1 
inf Elje=jZ = 75 >» iat Elen V2 | “Gg gt ae) = 0 


and combining with (16.89), this completes the proof. Oo 
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One may also ask the following question: What are the sequences (a,,) such that 
for any permutation z and any orthonormal sequence (¢), the series ae An (m)Pm 
converges a.s.? The answer to this question was also discovered by A. Paszkiewicz 
and is announced in [81]. Given the sequence (a,,) and the permutation z of N, we 
define the set 


f=) yo eg eeel). (16.97) 


l<m<n 
We also consider the numbers 


te) jae 2 Sa, So |. (16.98) 


Without loss of generality, we assume that )~,, a7, < 1/2, so that >, by < 1/2. 


Theorem 16.7.1 For a sequence (am), the following are equivalent: 


(f) For every permutation m and every orthonormal sequence (Qm), the series 


Yon Ar (m) Pm Converges a.s. 
(g) We have 


> 2*\/bk < 00. (16.99) 


k>1 


(h) There exists m such that 


2-"card{l €Z,; INT, 4B} <ow. (16.100) 


n>0 


(i) Condition (16.100) holds for each x. 


This should be compared with Corollary 16.3.2, which asserts that when the 
series )\,, dmYm converges a.s. whatever the choice of the orthonormal sequence 
@m, then >. 2*k by. < oo. The stronger hypothesis of Theorem 16.7.1 implies the 
stronger conclusion (16.99). 

It is not very difficult to prove the equivalence of (g) to (i), and this is what we 
will do first. We will then deduce the rest of Theorem 16.7.1 from Theorem 16.2.1. 
Condition (16.100) is the natural “covering number condition” adapted to processes 
that satisfy (16.13) (see Exercise 16.2.2). The deepest idea of the proof is that 
the more sophisticated conditions of Theorem 16.2.1 are equivalent to this natural 
covering condition “when the set T is homogeneous”, and to prove that (f) implies 
(h), we will construct so that 7; is “as homogeneous as possible”. 

Let us prove “the easy part” of Theorem 16.7.1. 
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Proposition 16.7.2 Conditions (g) to (i) of Theorem 16.7.1 are equivalent. 


Proof We first prove that conditions (16.99) and (16.100) are equivalent when zr is 
the identity. We first prove in that case that (16.100) implies (16.99). For n > 1, we 
define 7, as the set of dyadic intervals J € Z, for which TN I 4G. 

In Lemma 16.3.3, we constructed a probability measure yz on T which 
satisfies (16.24) for 2k-l <n < 24, ie., the first inequality in Eq. (16.101) below. 
Thus, 


1 
sv be < DV RD = DY) V2" UD < V2" card Jn, (16.101) 
TET, TEIn 


where we have used the Cauchy-Schwarz inequality as in (16.17) in the last 
inequality. Summing over n with 2!! < n < 2 yields 


eke Yo 2 Pond. 


2k-l<n<2k 


Summing then over k proves that (16.100) for the identity implies (16.99). 
We next prove that (16.99) implies (16.100) when z is the identity. We shall 
prove that 


> V2 card Jn < 00. (16.102) 


n>0 


2 and let r* = 


Let us as usual enumerate T as a sequence fy = 0) <m<n n> 


Yin=1 Gm- Define 
We = {tm 3 max(a2, a2, 1) > 2-7 YU Ih AY, 
_ok 
Va = U {[tn. toil; a4 =titi-t <2 : } Cc [0, 1] . 
Denoting Lebesgue’s measure by A, we deduce from (16.98) that 


MVe) =o (a2, a2 < 2-745 0b. (16.103) 


r>k 


Also, 


F 2 22; =F 2 —2k 
card Wy < 2+ card{n; max(a,,a,4;) > 2." }<2+2card{n; a, >2°~ }, 
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‘ _ ok k k . 
and since card{n; a? >2-? }< 22 ey a>, < 2?", we obtain 


card Wy <24+2-27". (16.104) 


Consider 2+! < € < 2**? and an interval J € J. We will prove that one of the 
following occurs: Either 


TOW 4G (16.105) 
or else 
TCV. (16.106) 


To prove this, we assume J 1 W; = &, and we prove (16.106). First, since J € Je, 
then J meets T C [f1, t*] by definition, so that either t; € J, or t* € J, or else 


Tcla,etc Jit, mil. (16.107) 
n>1 
Since we assume that J 1 Wy = @, we have in particular that 1, f* ¢ J and 


thus (16.107) holds. Consider an interval [t), tp1] which meets 7. Then either ft, or 
tn+1 belongs to /, for otherwise I C]tn, tn+i[, contradicting the fact that 1NT 4 @. 
Since 1) W; = @, it cannot happen that both tf, and t,+1 belong to Wx. The definition 
of W; shows that Oo < a. so that [t), tr4i] C Ve by definition of Vy. We have 
shown that every interval [t,, tr41] which meets J is a subset of Vz. Since (16.107) 
holds, we have proved (16.106), finishing the proof that either (16.105) or (16.106) 
holds. 

There are at most card W; intervals J € Ze which satisfy (16.105). Since A(7) = 
2“ forl ¢€ T,, there are at most 2°2x(Ve) such intervals contained in Vz. Since every 
interval J € Zp satisfies either (16.105) or (16.106), recalling (16.103), 


card Je < card We + 2°A(Ve) < 242-27 +2°S"b,, 


r>k 


so that (using the inequality /a +b < /a+J/b) 


Jeard Je < L2"*' 4.2¢/2 [<br . 


r>k 


This holds whenever 2+! < ¢ < 2+? and therefore 


2 joni S127 Tt yb. 


gkt+l<g <gk+2 r>k 
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Summation of these inequalities over k > 1, use of the inequality ,/>),.,b, < 


eee /b, and of (16.99) implies (16.102). This concludes the proof that (16.99) 
implies (16.100) when zr is the identity. 

The case of general wz follows, because the value of bg does not depend on the 
order in which we consider the elements a,,. So we simply replace the numbers aj, 
by the numbers a,(,,) to obtain the equivalence of (16.99) and (16.100). oO 


The next goal is to complete the proof of Theorem 16.7.1. First, we prove that (i) 
implies (f). This is because each set T;, satisfies (16.100) so that it satisfies condition 
(c) of Theorem 16.2.1, as follows from the Cauchy-Schwarz inequality (as in (16.17) 
again): 


> J/2-"u(1) < /2-" card{I €Z, ; INT, #9}. 


IeTn 


Now we come to the main argument: the proof that (f) implies (g). Let us first 
introduce some notations. Given a finite set U, let us denote by 7 (U) the collection 
of sets of the type 


T = {0,u1,ui +u2,...,u) tur +--+ + ug} 


where q = cardU and (ug)e<q is an enumeration of the elements of U. In other 
words, to construct T € J (U), we start with 0, and, having constructed an element 
of T, we construct the next largest element by adding an element of U, in such a 
manner that each element of U is used exactly once for this purpose. For such a set 
T, we denote by t*(T) its largest element, so that t*(7) is the sum of the elements 
of U. The center of the proof is the following: 


Proposition 16.7.3 There exists a universal constant L*, with the following prop- 
erty. Consider a finite set J C {10, 11, ...} with the property that 


k<KeJak—-k>5. (16.108) 
Fork € J, consider a finite set Ug C R*, assume that 
weer?” <aeo, (16.109) 
and set 


b= Dou. (16.110) 


524 16 Convergence of Orthogonal Series: Majorizing Measures 


Assume also that \°,<7 bk < 1/2, and denoting by k* the smallest element of J, 
assume that 


ok*—3 


Vee; dp S2 (16.111) 


Let U = Urey Ux. Then there exist a set T € T(U) (so that t*(T) = ye bx) and 
a probability measure 4 on T such that ({0}) = 0 with the following property. 
Consider x € [0, 1] with x + t*(T) < 1. Then for each k € J, we have 


ok /De 
L* 


(16.112) 


>. V2 "ud —x) > 


2k-2<n<2k-1 JeT,,I—xC[0,t*(T)] 


so that in particular 


k 
~ YO vedo ee (16.113) 


L 
n>0 IeT,,1—xC[0,t*(T)] keJ 


The number 5 in (16.108) is simply a convenient choice whose relevance will 
became apparent gradually. The condition (16.111) is purely technical, and its 
relevance will become apparent when we prove the proposition by induction over 
card J. The important part is condition (16.112). To understand it better, we note 
that J —x C [0, ¢*(7)] if and only if J C [x, x +t*(7)] and that for x + 1*(T) < 1, 
the interval [x, x + t*(T)] is a subinterval of [0, 1], so that it has a good chance to 
contain plenty of intervals J € Z, which will contribute to making the left-hand side 
of (16.113) large (this would be less the case if x + t*(T) was > 1). 

We will discuss and prove Proposition 16.7.3 later, but our first goal is to 
complete the proof of Theorem 16.7.1, i.e., to prove that (f) implies (g). 


Lemma 16.7.4 Assume that the sequence (bx)x>1 satisfies ie 2k /B, = OO. 


Then given k and A > 0, we can find a finite set J C {k,k + 1,...} which 
satisfies (16.108) and p< 2k /be = A and for which by => 2~** for allk € J. 


Proof For 0 < j < 4, consider the set J; of integers > k which are equal to 
j modulo 5, so that each such set satisfies (16.108). There exist 0 < j < 4 
such that Dekel; 2* ./be = oo. Define now J = {k ¢ Ij; be = g-4ky. Since 


heart 2k ./B. < co, we have Dy fei 2k /b_e = oo. Then there is a finite subset 
J of I with ye) 28 V be > A. Oo 


Proof that (f) implies (g) We argue by contradiction, and we assume that (g) fails, 
1.e., et 2k. /by. = oo. Consider the set V = {a?; m > 1}. We construct by 


induction finite sets V; C V with maxV;i; < minV,, sets T; € 7(V;), and 
probability measures zs on T; with js({0}) = O and the following property. 
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Consider x € [0, 1] with x + t*(7,) < 1. Then 


y = 2 Mel eye Os (16.114) 


n>0 1eZ,,1—xC[0,t*(Ts)] 


The construction is inductive. Having constructed V,, consider k, such that a ee 
u for all u € V,. We construct a set J by applying Lemma 16.7.4 with k = k, + 1. 
We then apply Proposition 16.7.3 with U; = {a>; er ae < ya fork e J 
to find V;+1, completing the induction. 

By construction for s > 1, we have 7; = {0} U {yo ieaen a, ons qs} 
where the integers 7,5; for s > 1 andm < qs are all distinct. Consider then a 
permutation z with the property that for each s, the integers rms5,1 < m < qs 
occur as w(js + 1),..., (js +s) for consecutive integers js; +1,..., js + qs. Let 
us set x5 = ei G2 em)’ Then for each n < qs, we have xs; + Lae a, = 
emia - so that xs + T; C Ty. In particular, xs + t*(T;) < 1, so that we 
can use (16.114) for x = x; to obtain 


deve sta ae)= DP YE V2 Mus x5) 22, 


n>0 JET, n>0 TeT, ,1—xs C[0,t*(Ts)] 


This proves that the probability measure v on 7, given by v(C) = ps(C — Xs) 
satisfies 


Sy eS? 


n=0 leTy, 


Thus, condition (c) of Theorem 16.2.1 is not satisfied. Thus, there exists an 
orthonormal sequence (¢,,) such that the series a Ax(m)Pm does not converge 
a.s. so that (f) fails and the proof of Theorem 16.7.1 is complete. oO 


We turn to the discussion of Proposition 16.7.3. The main idea is very simple (but 
unfortunately the details are quite taxing). For each k in J, the following happens: 


elfn < Qk-1 at scale 2~”, the measure yz looks very much like the uniform 
measure on a set Sz with A(S;) > bp. 

* The set S; is a union of intervals. If n > 2*~, all these intervals are very much 
longer than 2~”. 


Assuming 2‘~? < n < 2‘—!, we can then pretend that jz is the uniform measure 
on S, to estimate the term  ieZy,1—xCl0.1*(T)] 2-"wU — x). Let us first estimate 
how many / € TZ, are such that J — x C S;. We have J — x C Sy, if and only if J C 
x + Sz, and when x < 1—1*(T), we have x + S; C [0, 1] because S; C [0, t*(T)]. 
Since S; is a union of intervals, each of which is of length much larger than 2~”, 
for x < 1—f*(T), there are about 2”A(S;) sets J € Z, such that J —x C Sx. 
For each of these sets, let us estimate 44(7 — x). Since we pretend that jz is the 
uniform measure on Sx, it has density 1/A(S,) on Sx, so that if J — x C S x, then 
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(I —x) is about 2~” /A(S,), and hence ,/2~"w(U — x) is about 2" /./A(S;). Thus, 
the sum ieZ,,1—xC[0,t*(T)] V2" wT — x) has about 2”A(S;) terms each about 
27" //ACSz) So itis > /A(S,)/L > bx /L, and then 


ok /b 
Jf2-"wU — x) > a (16.115) 


2k-2<p<Pk-1 1€Ty,1—xC[0,t*(T)] 


which is the desired inequality (16.112). 

How is the situation of Proposition 16.7.3 possible? The important idea is 
separation of scales. If k’' > k + 1, the elements of U,- are in a sense “infinitely 
smaller” than those of U;. The use of (16.108) is to ensure separation of scales 
between the different values of k € J. Let us try to visualize the measure pz by 
blowing the picture by a factor 2 every second [31]. At the beginning, the measure 
4 appears uniform on a given interval [0, b]. After a while however, this no longer 
appears to be the case, gaps appear, and jz seems to be carried by a union of many 
very small intervals (which can be of rather widely different sizes). It really looks 
like the uniform measure on the union of these intervals. Waiting quite longer, this 
appears to be no longer the case. Gaps appear. Each of the very small intervals 
breaks into extremely small intervals. These can also be widely different sizes. 
However, the longer of them are still very much shorter than the shortest of the 
previous very small intervals. And yz looks like the uniform measure on the union 
of these intervals. The amount of time it takes to go through one step of the process 
increases as a geometric series. 

The main ingredient of the construction is the following principle. Consider two 
probability measures on a set S C [0, 1] which is a union of very small intervals. 
If these two measures give the same (small) mass to each of these intervals, at a 
large scale, they are nearly identical. The principle will be used when one of the 
probability measures is the uniform probability on S. 

It could help the reader to start with the case card J = 1. 


Proof of Proposition 16.7.3 when card J = 1 Writing k = k* for simplicity, we 
have J = {k}, and all the elements uv of U = Us satisfy go xe a < a 
We enumerate U = {u1,...,Ug}, so that b := by = pare ue > g-4kg- 2k 
by (16.111). For < qg, lettg = )°,, <p Um, and let T = {t9 = 0, t1,...,tg} € TU). 
Consider the probability measure yz on T such that p({te}) = ue/b = te — te, for 
£ < q. One way to visualize this measure is to start with the uniform measure on 
the interval [0, b], which is the union of the intervals ]te_1, t¢]. The small interval 
]te_1, te] has total mass ug/b. This mass is then swept to the right of this interval. 
What matters is that the small interval |tg_1, te] has the same mass for the uniform 
measure on the interval [0, b] and for jw. 

For n < 2*—!, at the scale 2~”, the probability jz looks like the uniform measure 
on the interval [0, b] because the distance between two consecutive elements of 
T is smaller than a which is very much smaller than 2”. For J € Z, with 
I —x C [0, b], the measure of J for the uniform measure on [0, b] is 2”/b so that 
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UU —x) will have nearly the same value, and in particular, we will have wU/—x) > 
2-"/Lb. Consequently, /2-"w(I — x) > 27"/(Lv/b). Next, if 24? < n, then 
b2” > n2-4kg-2> is much larger than 1. Thus, given 0 < x < 1 — r*(T), there 
are then about 2”b sets J € Z, for which J C [x,x + t*(T)] = [x,x + 5] c [0, 1], 
ie., there are about 2”b sets J € Z, such that J — x C [0, t*(T)] = [0, b] so that 
ret, 1—xc[0.0*(Ty) V2" HC — 2) is at least of order 2"b x 27" /Lv/b, i.¢., /b/L. 
As this is true for each 2‘~? < n < 2*—!| the left-hand side of (16.112) is at least of 
order ok ./b /L as desired. 

Proving that things happen the way we described them requires no skill 
whatsoever because there is all the room in the world for the estimates. This is 
better left to the reader, as any attempt to write these estimates makes the proof 
unreadable. oO 


Proof of Proposition 16.7.3 The proof is by induction over card J. We denote by 
k* the smallest element of J, and we set J’ = J \ {k*}. We enumerate U;» as 


‘ : k*+1 
uj,...,Ug. The sum of these elements is at most 1, and each of them is > a ae 


Thus, g < 2" *" For € < q, we set Be = ue¢/bx«, so that Vieeg be = 1. 

The proof will require using the induction hypothesis for each € < q. The first 
step of the proof is to partition each set U; (k € J’) into q disjoint sets (Uk,e)e<q: 
so that the elements of U;_¢ will be used for the construction associated with ¢. This 
partitioning is done in such a way that the proportion of Ux attributed to Ux.¢ is 
about f, that is, 


bee= D>) ud Bebe = Be Du. (16.116) 


ucU.¢ ucU, 


To prove that this is possible, it suffices to show that for k € J’, the elements of Ux 
ee a k*¥41 
are very small compared to 6eb;. This is because since Be = ue/byx > ue = a 


by (16.111), we have 


gk +1 gk* +2 


Boby > 2742-2 9-2 > 2 x 2-4 Q- 


The smallest element & of J’ satisfies k > k* +5 so that since bye ~ Bebx, we will 


have by ¢ > Q-4kg 29 ie., the condition (16.111) is satisfied for the sets Ux,¢ for 
k € J’. Given £, we may then use the induction hypothesis for these sets. We then 
obtain sets Ty € T (Uge yr Ux. ¢), with 


(Te) = D0 bee = Be De. (16.117) 


keJ’ keJ’ 


5 Again, no skill whatsoever is required there. 
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TAY, TraatVked TraatVice2 
i —— eS —————— 


Fig. 16.1 View of the set T at a scale where the intervals between the sets 7; + vz can be seen, 
but not yet the individual structure of these sets 


and probability measures jzg on Ty with the following property. Consider x ¢€ [0, 1] 
with x + t*(T?) < 1. Then for each k € J’, we have 


k 
- J2-"4 —x) > avis (16.118) 


2k-2<pn<2k-| TET, 1 —xC[0,t* (Te) ] 


Oo 


The set T \ {0} will be the union over £ < q of translates ve + Ty of the sets Ty. 
The numbers ve are recursively determined as follows, vj = ui, and, once vg has 
been constructed, vg+; is such that the interval between the largest element we of 
ve + Te and ve+) is we44, 1.€., Ver) = Wey + ue14.° It should be obvious that 


rTy= Yo ue +> F() = So bk : 


<q l<q keJ 


Next, we claim that T € 7(U). Recall that a set in 7(U) is a set such that 
to go from one element of this set to the next largest one, one adds an element of 
U, in such a way that each element of U is used exactly once in this manner. The 
elements ug of Ux» is used to go from we_; to vg. The elements of Uge y/ Ug, ¢ are 
used when going from one element of vg +7, to the next (since T, € T (Uge ys Uk.¢)). 
Conversely, going from one element t € vg + Ty C T to the next element of T 
requires adding an element of Uze y/Ux,¢, unless x = we in which case this requires 
adding ue+1 (Fig. 16.1). 

The probability measure ~z on T is defined as er Be’, where ju), is the 
translation by vg of the probability ze. The main idea behind this construction is 
that at the scale 2~” forn < 2*°—!, the measure jj will look uniform on the interval 
[0, *(T)]. The reason for this is very simple. Observe that since jz¢({0}) = 0, the 
probability Ly is supported by the interval ]ve, ve+1], so that w(]ve, veri) = Be. 
However, recalling (16.117), we have 


Veq1 — ve = Ue +£*(Te) ~ Bebpx + Be > be = Be So bk = bal) 


keJ’ keJ 


6 And, as we look at the structure at increasingly finer scale, these are the first gaps which will 
appear, and the gaps inside each block are much smaller. 
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Thus, the measures of the very small intervals ]ve, ve+1] are almost exactly 
proportional to their lengths, so that 4 looks uniform on the interval [0, t*(7)] at 
the scale 2-” for n < 2*°—!, and as we explained, this implies (16.112) for k = k* 
(and since t*(T) > b+). 

Now, we have to prove (16.112) fork € J’. Consider J € TZ, with J —x C 
[ve, ve +1*(Te)], so that uw — x) > Bey — x) = BemeU — (x + ve)), using in 
the last equality that jz) is the translation of jug by ve. We have shown that 


» V2-"uI — x) 


Qk-2<n<2k-] TET, , 1 —xC[ve, ve +t* (Te) ] 


> /Be > > 2" ue —(x+ve)). (16.119) 


D-2<py<2k-] LET, 1 —(x+v¢)C[0,t*(Te)] 


Consider 0 < x < 1 — t*(T), and observe that then for each € < q, we have 
x +g < 1—1*(T?). We can then use the induction hypothesis (16.118) (with x + ve 
instead of x) to obtain that the right-hand side of (16.119) is > aR /Bebx.e/L*. Thus, 
we have shown that 


ins eae (16.120) 


2k-2<n<2k-! TET, 1—xC[ve,ve+t* (Te)] 


Since the intervals [ve, ve + t*(T)] for € < g are disjoint subintervals of [0, r*(T)], 
by summation of the inequalities (16.120) over £ < g, we obtain 


y eae ype ee g 


2k-2<n<2k-! TET, ,I—xC[0,0*(T)] esq 


Since bye ~ Bebx, the right-hand side is nearly 2k / Dx /L*, and we almost obtain 
the required inequality (16.112). To make the proof complete, it suffices to quantify 
the errors made in the statement (16.116) and to show that they do not destroy the 
argument. Since there is plenty of room, this is better left to the reader.’ 


16.8 Chaining, II 


For the special sets T of the type (16.10), the equivalence of (c) and (d) of 
Theorem 16.2.1 tells us for which sets T all the processes satisfying the increment 
condition E(X, — X,)? < |s —t| for s, t+ € T are bounded. Our goal is to investigate 


7 One may use in particular that since the elements of Ux,¢ are < 9-2! , it should be obvious that 
one can achieve by.¢ > Bebe — 272) > (1 —27*) Bebe. 
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the same question for more general metric spaces under the more general increment 
conditions (16.1): 


bes 
Vs,teT, Ey(— A ae 24, (16.1) 


where ¢ is a Young function as in Definition 16.1.1. 

What are the weakest possible natural conditions that will ensure that we control 
the size of the process (X;);er under (16.1)? We consider this question in the 
remainder of this chapter. 

The material of this section is self-contained, but the reader might do well to 
master first the simpler ideas of Sect. 16.1 to provide perspective. For simplicity, we 
consider only the case where T is finite. We will first develop a chaining scheme. 
This scheme is related, but different, from the scheme considered in Sect. 16.5 
(which was well adapted to our limited goals there). 

We say that a sequence T = (Ty)n>0 of subsets of T is admissible if it satisfies 


card 7p = 1 (16.121) 
and 
card T, < v(4"). (16.122) 


We do not require the sequence (T,,) to be increasing. Let us consider the following 
quantities: 


Sa(T) = sup ) 4" d(t, Tn) (16.123) 
teT n>0 
and 
4"d(s, Tr 
sn=> > een De (16.124) 
n>1sETy 


In the case where g(x) = exp(x7) — 1, which corresponds to Gaussian processes, we 
have card T, < exp(4*"), and the quantity (16.123) is then basically the right-hand 
side of (2.34) (the difference is that we change n into 4). The new feature here is the 
quantity S* (7), which was not needed in the Gaussian case or more generally in the 
case where one has “exponential tails”. The formulation of the following theorem is 
due again to W. Bednorz: 
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Theorem 16.8.1 Consider a process that satisfies (16.1). Then, for each sequence 
T = (Tn)n>0 of admissible sets, we have 


E sup |X; — X;| < L(Sa(T) + Sz(7)) . (16.125) 


s,teT 


It is not required here that EX; = 0. 

Proof For n > 1, let us define a map 9, : T, — T,~1 such that for s € T;,, one has 
d(s, On(s)) = d(s, Tn-1) . (16.126) 
We may assume that Sg(7) < oo for otherwise there is nothing to prove. This 
implies that for large m, T,, is a good approximation of 7’, and in particular, since 
T is finite, there exists m with T = T,,. Let us consider such a value of m. For 
t € T, we define z(t) = t, and we define recursively w-1(t) = On(stn(t)), so 

that (16.126) implies 
A(Tn(t), Tn—1(t)) = a(n (t), Th-1) - (16.127) 


For x, y > 0, the inequality 


(16.128) 


P2750) 
x g(x) 


is obvious if y < x, andif x < y follows from the fact that p(x) < xg(y)/y by 
convexity of g. We use (16.128) with y = |Xs — X9,(s)|/d(s, On(s)) and x = 4” to 
obtain (since y(y) = g(y|)), 


4” d(s, 6, X,—X, 
IX, — Kaye] <4%d(s, y(s)) + SEH) (Ae — Xu) 46.129) 
g(a") d(s, On(s)) 
Using this for s = mz,(t) yields (using a crude bound to obtain a last term 


independent of f) 


4” d(s, On(s)) (= = 2). 


Xx — Xz <4" d(ttn(t), Wn—1(t 
| Xx (¢) n1(t)| (An(t), Tn—1(1)) + 2 g(4") d(s, @n(s)) 


SETy 


Combining with (16.44) if To = {to}, we obtain 


IX: — Xi] < 0 4"d(tn-10), tn (0)) 


n>1 


4” d(s, n(S)) Xs —- X6n(s) 
+ ay aeagy) 08180 
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and consequently, 


sup |X — Xi < sup ) > 4"d(amm—1(t), nt) 


teT 


n>1 
A”d(s,On(s)) (Xs — X6(s) 
+> ee SS 
oo 94") Crorxoyy, 
Taking expectation and using (16.1) yields 
Esup |X; ~ Xl = sup) 4"d(sn—1(t), tn (t)) + S5(T) . (16.132) 


teT ys] 


Now, recalling (16.127), 


A(Tn—-1(t), M(t) = d(tn(t), Th-1) 
< d(t, Tr-1) + d(t, M(t) 


<d@, Tri) + Yd), m1 (0) - 


k>n 


Thus, using that )>,, , 4" < 441! /2 we get 


Yi 4" dtn-1(t), tn) < D5 4"dt, Tr) + 904" Yo dee), tHe) 


n>1 n>1 n>1 k>n 
= Viatde, Tr) + YL 4" ae, eH) 
n=l k>1  n<k 


1 
SDA ae, Tad + 5 DA ade), mes) 


n>1 k>1 


1 
< 0 4"d(t, Tr-1) + ; S\4"d(atn-1(0), Tn(t)) , 


n>1 n>1 


so that recalling (16.123), we get 


D5 4'd(n—10), tat) S25 4"d@, Trt) = 8 4" dt, Tn) < 88a(T) « 
n>1 n>1 n>0 


(16.133) 


Combining with (16.132), this finishes the proof. oO 


Interestingly, the previous proof does not use (16.122)! 


16.8 Chaining, II 533 


Corollary 16.8.2 Define e} = A(T, d) and forn > 1, define 


e; =inffe>0; JU CT, cardU < 9(4"), VteT, d(t,U) <6}. 
(16.134) 


Then 


E sup |X; — Xi] < LY 1 4"e . (16.135) 


s,teT n>0 


Proof Consider an arbitrary point fo of T, and set Ty = {to}. For n > 1, consider a 
subset T, of T with cardT, < y(4") and d(t, T,) < 2e% for each t € T. It is then 
obvious that the quantities Sq(7) and S(T) of (16.123) and (16.124) satisfy 


Sa(T) SL) 4k; SHT) <L) 4" eh. o 
n>0 n>0 
Exercise 16.8.3 Deduce Pisier’s bound (16.5) from Corollary 16.8.2. 


The bound of Theorem 16.8.1 raises two questions: How to construct admissible 
sequences? How sharp is this result? 


Definition 16.8.4 For a metric space (T, d), let 


Shader = sup {E sup |X, — x: (16.136) 


5,tET 


where the supremum is taken over all the processes which satisfy (16.1). 
The reader will need this definition throughout the rest of this chapter. We 
reformulate (16.125) as 
S(T, d, g) < L(Sa(T) + Sq(7)) , (16.137) 


and the question arises as to which extent this inequality is sharp for the best possible 
choice of 7. W. Bednorz has recently discovered a rather general setting where this 
is the case. To describe it, we need the following concept: 


Definition 16.8.5 Consider p > 1. A distance d on a metric space is called p- 
concave if d? is still a distance, 1.e., 


d(s,t)? <d(s,v)? +d(v,t)? . (16.138) 


This definition is well adapted to the study of the distance d(s,t) = ./|s — ¢[, 
which is 2-concave. Unfortunately, the usual distance on R” is not p-concave, and 
as we shall see later, this case is considerably more complex. 
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One of the results we will prove is that for a p-concave distance, the inequal- 
ity (16.137) can be reversed. The proof is indirect. We will show that both sides of 
this inequality are equivalent to a new quantity, itself of independent interest. 


Theorem 16.8.6 (W. Bednorz [8]) Assume that 
the function x tb» yg |(1/x) is convex . (16.139) 


Assume that the distance d is p-concave. Then there exists a probability measure | 
on T for which 


A(T,d) A 1 
sup | Q Gaus)” < K(p)S(T,d,¢) , (16.140) 


where S(T, d, 9) is defined in (16.136). 


Condition (16.139) is inessential and is imposed only for simplicity. It is the 
behavior of y~! at zero that matters. 


Theorem 16.8.7 (W. Bednorz, [11]) Consider a probability measure 4 on T, and 
let 


A(T,d) 24 1 
B = sup y (—___)e (16.141) 
teT JO U(B(t, €)) 


Then there is an admissible sequence T of subsets of T for which 
Sa y= ERY SST) =LB. (16.142) 


Thus, through (16.137) and (16.142), any probability measure jz yields a bound 
on S(T, d, g). In this context, such a probability jz on (7, d) is traditionally called 
a majorizing measure. The importance of majorizing measures seemed to decrease 
considerably with the invention of the generic chaining, as they seemed to have 
limited use in the context of Gaussian processes (see Sect.3.1), but as we saw in 
Chap. 11, matters are more complicated than that. 


Definition 16.8.8 For a metric space (T, d), we define 


A(T,d) e 1 
M(T,d, 9) = int | sup [ g (aa) , (16.143) 


where the infimum is taken over all probability measures yz on T. 


Combining Theorems 16.8.1, 16.8.6, and 16.8.7, we have proved the following: 
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Theorem 16.8.9 Assuming (16.139), if the distance d is p-concave, then 


S(T,d,@) S Linf(Sa(7) + SH(T)) < LM(T,d,¢) < K(p)S(T,d,9) . 
(16.144) 


Thus, S(T, d, @) is of the same order as M(7,d, g), but the determination of 
the quantity M(T, d, p) is by no means easy. 

Let us turn to the proof of Theorem 16.8.6. A p-concave distance satisfies the 
following improved version of the triangle inequality: 


Lemma 16.8.10 Jf the distance d is p-concave, then for s,t,v € T, we have 


d(s, ft) 
d(s, v) —d(t, v) < d(s, OGaa oy (16.145) 
Proof We have 
a 7, d(s,t)? 
d(s, v)? < dC, v)” +d(s,1)” = det, v)?(1+ ce >P) 
so that since (crudely) (1 + x)'/P <14-x forx > 0 and p>, 
d(s,t)?\1/p d(s,t)\p-1 
d(s, v) < d(t, (1 oe a <d(t,v) +d(s, (ie >) no 


Lemma 16.8.11 Considers, t € T. Then for each probability measure j. on T, one 
has 


max(d(s,w),d(t,w)) 1 
; aco) [ —————de < K(p)d(s,t). (16.146) 
T min(d(s,@),d(t,0)) -(B(@, 3e€)) 


Proof Let us consider the set A = {w € T ; d(t, w) < d(s, w)}, so that forw € A, 
the second integral in (16.146) is from d(ft, w) to d(s, w). Since for d(t, w) < € we 
have B(t, 2€) C B(o, 3¢), it suffices to prove that 


d(s,@) 
; RO )as; 16.147 
fan uN fi, #) ET deye € < K(p)d(s,t) ( 


and to use the similar result where s and ¢ are exchanged. Let 


Ap = {w@ EA; d(t,w) < 2d(s,t)}, 
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so that 


d(s,@) 1 
du(w) / de 
[, d(t,o) U(B(t, 2€)) 


= iI Lya(t,w)<e<d(s,0)}U{dct,0)<2d(s,1)} dedy(a) . 


1 
U(B(t, 2€)) 


Then, since d(s, w) < d(s, t) +d(t, w), fore < d(s,w) and d(t, w) < 2d(s, t), we 
have € < 3d(s, t) so that 


d(s,@) 1 
duo) / de 
[, d(t,o) U(B(t, 2€)) 


1 
Hy {d(t,w)<e} 4{e<3d(s,t)} (Br, 26) edj(@) 


1 
(BG, 2e) " Lait.) <e}d ue (@) 


/ U(B(t, €)) 4 


1e<3d(s,1) 


Ne<3d0s.0) BG, 6) 
Le<3a(s,n de = 3d(s, t) . (16.148) 
Next, for n > 1, let 
An ={@ € A; 2"d(s,1) < dt, o) < 2"*d(s,D} C BG, 2"t dG, 0), 
so that the sets (Ay)n>o0 cover A. It follows from (16.145) that for w € An, 
d(s,w) —d(t,w) <2"? Yad(s, t). 


Furthermore, for @ € A, and d(t,w) < €, we have 2”t!d(s,t) < 2e so that 
Bit, 2"+1d(s, t)) C B(t, 2e) and 


d(s,@) 1 1 
—n(p—1) 
/ de < 2°" Das, )____ 
dito) L(B(t, 2€)) w(B(t, 2"*!d(s, t))) 


Consequently, since “(Ay) < u(B(t, 2"+1d(s, t)), 


d(s,@) 1 Got) 
aco) | ————de < 2°" ""d(s,t) . 
[ d(t,o) U(B(t, 2€)) 


Then (16.147) follows by summation over n > | and combining with (16.148). O 
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Proposition 16.8.12 If the distance d is p-concave, then for each probability 
measure jt on T, one has 


A(T,d) 4 1 
[ene f Q Gece < K(p)S(T,d, 9) , (16.149) 


where S(T, d, g) is defined in (16.136). 
Proof On the probability space (T, 2), consider the process (X;);er given by 


A(T,d)/2 
X+(@) = cf 


=) 
” —____— )de (16.150) 
min(A(T,d)/2,d(t,«)) Cras 3€)) 


where the constant c < 1 will be determined later. Next, we claim that 


A(T,d)/2 ; 1 
sup IXs(o) — X(w)1 = ¢ o (ee. 
‘ar ‘ 0 (aes 


To see this, we choose s such that d(s,@) > A(T, d)/2, so that X;(w) = 0, we 
choose t = w, and we compute X;(w) by the formula (16.150). Consequently, 


A(T,d)/2 i 1 
2 oe ald (|g 6151 
sup |X, ‘12 cf wo | . isaac) : : 


s,teT 


Next, we set dg(w) = min(d(s, w), d(t, w)) and bo(w) = max(d(s, w), d(t, w)) and 
a(@) = min(A(T, d)/2, ag(@)), b(@) = min(A(T, d)/2, bo(@)). Then 


bo) 1 
|X;(w) — X;(@)| =ef " (qaaan 


Since b(w) — a(@) < d(s,t), we have c(b(w) — a(@))/d(s,t) < 1. Using the 
convexity of ¢ in the first inequality, and Jensen’s inequality in the second inequality, 


(= a ~ @) 


_ (“tera aeon I B@) -1/ I Jee) 
~P\ ds.) -b@)—a) Jaw) © \WB@. 36) 


c(b(@) — a()) 1 ee sg 1 
=~ 4G.b (yaa ent (aay )*) 
Cc b(w) 1 


= F6.0 Jaw) HB@.3—) (16.152) 
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Now, if aon(w) > A(T, d)/2, then bo(m) > ao(w) => A(T,d)/2, and then 
a(@) = b(@m) = A(T, d)/2, and the term in the last line of (16.152) is 0. If 
ajg(w) < A(T, d)/2, then a(@) = ao(@) < b(@) < bo(w), and it follows 
from Lemma 16.8.11 that the term on the last line of (16.152) is < cK(p). 
Consequently, we may choose c = 1/K (p) depending on p only such (16.1) holds. 


Combining (16.151) with the definition of S(T, d, g), we then obtain 


A(T,d)/2 
sup [ o-'(—~__Jae < K(p)S(T, 4,9) , 
0 (Bla, 3€)) 


wceT 


and a change of variable then completes the proof. oO 


Proof of Theorem 16.8.6 Combine Proposition 16.8.12 and Lemma 3.3.3 used for 
P(x) =y '(1/x). o 


We now turn to the proof of Theorem 16.8.7. We have to use a probability 
measure as in (16.141) to construct a suitable admissible sequence. There is a 
genuine difficulty in this construction, namely, that the measure of the balls B(t, €) 
can greatly vary for a small variation of t. This difficulty has been bypassed in full 
generality by an argument of W. Bednorz, which we present now. This argument is 
so effective that the difficulty might no longer be noticed. Without loss of generality, 
we assume 


g()=1, (16.153) 


but (16.139) is not required. 
The proof of Theorem 16.8.7 is based on the functions €, (t) defined for n > 0 as 


; 1 
én (t) = inf [e > 0; u(BUt,€)) > AT (16.154) 
This quantity is well defined since y(4”) > 1 forn > 0. 
Lemma 16.8.13 We recall the quantity B of (16.141). We have 
1 

M(B, en(t))) = o(4") , (16.155) 
len(s) — €n(t)| S d(s, 1) , (16.156) 
VteT, yo 4"en(t) < 2B. (16.157) 


n>0 
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Proof First, (16.155) is obvious, and since B(t,€) C B(s,e + d(s,t)), En(s) < 
€,(t) + d(s, t) and (16.156) follows. Next, since 


n 


1 
-1 
€<aq(t) >9 (ame 


we have 


En(t) 1 
B | A (En(t) — €n 
os (Taw) 2 DAO — ers) 


Now, 
= 1 
DF En(t) — en O) = Den — D4 a> 5) en). 0 
n>0 n>0 n>1 n>0 


Lemma 16.8.14 For each n > 0, there exists a subset T, of T that satisfies the 
following conditions: 


card T, < g(4"). (16.158) 

The balls B(t, €n(t)) for t € Ty, are disjoint . (16.159) 
VteT, d(t, Tn) <4en(t). (16.160) 

VteT,, Vs € Bt, en(t)) , €n(s) = sett ; (16.161) 


Proof We define Do = T, and we choose ft; € Do such that €,(t;) is as small as 
possible. Assuming that we have constructed Dg_1 4 @, we choose tg € Dg—1 such 
that €, (t) is as small as possible, and we define 


Dg = {t € Dy-1s d(t, th) = 2(En(t) + €n(t))} . 
The construction continues as long as possible. It stops at the first integer p for 


which Dp = @. We define T, = {t1, t2..., tp}. Consider t, th € Ty with k < k’. 
Then by construction, and since the sequence (D,;) decreases, th, € Dx, so that 


(ty, th) = 2(En (th) + En(tk)) , 
and therefore the balls B(t,,é,(t,)) and B(ty,€,(t)) are disjoint. This 


proves (16.159) and (16.155) imply (16.158). To prove (16.160), consider t € T 
and the largest k > 1 such that t € Dg_,. Then by the choice of t%, we have 
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€n(t) => €n(tg). Since by definition of k we have t ¢ Dx, the definition of D; shows 
that 


d(t, tk) < 2(€n(t) + en(te)) < 4en (2) , 


and since t,x € T,, this proves (16.160). 

Finally, consider t, ands € B(tg, €n(t,)). If s € Dg—1, then €,(5) > en (tg), 
and (16.161) is proved. Otherwise, the unique k’ such that s € Dy_, ands ¢ Dy 
satisfies k’ < k. Since s € Dy_; but s ¢ Dy, the definition of this set shows that 


d(S, th) < 2(En(8) + En(te’)) 
and since d(s, th) < €n(t,), we get 
d( tk, ty) < d(S, th) + d(S, th) < Ente) + 2(En(5) + Ente’) « (16.162) 


On the other hand, since k’ < k, then t, € Dg_1 C Dy so the definition of this set 
implies 


Atk, te) = 2(En(tk) + En(te’)) 


and comparing with (16.162) completes the proof of (16.161). Oo 


Proof of Theorem 16.8.7 For n > 0, we consider the set 7, provided by 
Lemma 16.8.14, so card Tg = 1. Combining (16.157) and (16.160), we obtain 


S\4"d(t, Tr) < 8B, 


n>0 


and this proves that Sa(7) < 8B. 
Next, since w(B(s,€,(s)) > 1/y(4") by (16.155) and since d(s,T-1) < 
4éen_—1(s) by (16.160), forn > 1, we have 


d(s, Tn-1) / 
ge A n—-1(s)d : 
2 p(4") = 2 Hee mee 


seT, seTy, 


It follows from (16.161) that fort € B(s, €n(s)), one has €y(s) < 2€, (t). Combining 
with (16.156) implies 


€n—1(S) < €n—1(t) + €n(S) < En—-1 (1) + 2en (0) , 
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and since the balls B(s, €,(s)) are disjoint for s € T,, this yields 


> A’ d(s, Tn-1) 


aa i, (ent) + 2en(t))d(0) . 


seETn 


Summation over n > 1| and use of (16.157) conclude the proof. oO 


When we do not assume that the distance is p-concave, the last inequality 
in (16.144) need not hold. This considerably more complex situation will be briefly 
discussed in the next section, and we end up the present section by discussing two 
more specialized questions. A striking feature of Theorem 16.2.1 is that even though 
we studied processes that satisfied E(X, — X;)? = d(s,t) where d is the usual 
distance on the unit interval, we ended up considering the sequence TZ, of partitions 
of this unit interval and, implicitly, the distance 6 given by 5(s, t) = 2~” where n is 
the largest integer for which s, t belong to the same element of Z,,. This distance is 
ultrametric, 1.€., it satisfies 


Vs,t,veT, d(s,t) < max(d(s, v), d(f, v)) . (16.163) 


In particular, a distance is ultrametric if and only if it is p-concave for all p. 
Ultrametric distances are intimately connected to increasing sequences of partitions, 
because the balls of a given radius form a partition in a ultrametric space. As the 
following shows, the implicit occurrence of an ultrametric structure is very frequent: 


Theorem 16.8.15 (W. Bednorz [12]) Let us assume that the Young function 
satisfies 


qn qk 
Vike ls ) oe (16.164) 
ey 4) 
Consider an admissible sequence T of subsets of (T,d). Then there exist an 
ultrametric distance 6 > d and an admissible sequence T* of subsets of (T, 6) 
such that 


Ss(T™*) + S3(T™") = K(C)(Sa(T) + SGT) , (16.165) 


where K(C) depends on C only. 


In words, this means that if the existence of an admissible sequence provides 
a bound for processes that satisfy the increment condition (16.1), then there is an 
ultrametric distance 6 greater than d and such that the processes satisfying the 
increment condition (16.1) for this greater distance basically still satisfy the same 
bound. 
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Proof Let T = (Tn)n>0. AS a first step, we prove that we may assume that the 
sequence (7;,) increases. Define To = Tp, and forn > 1, define T) = UpenTy. 
Thus, 


card 7) < >> 9(4*) < yo 4k "9(4") < g(4"), 


k<n k<n 


where in the second inequality we have used that p(x) < x@g(y)/y by convexity 
of yg. Thus, the sequence J’ = (7/)n>1 is admissible. Since d(t, T/) < d(t, Tn-1) 
for n > 1, it follows from the definition (16.123) that Sg(T’) < 4Sq(7). Next, we 
observe that for > 2, and since T/ = UgenTx, 


Sa t= YY eet ae yy dere, (16.166) 


seT/ k<n s€Ty k<n seTy 


because 7,1 C a _, fork <n. Forn=1,d(s, ti ,) = Oforse = = To. Thus, 
using (16.166) in the second line and (16.164) in the last line, 


A ees ee 


n>1 seT, 


4"d(s, Te1) 
pes or 


n>1k<n seTy 


=>) 4, he 2 oa ne 


k>1 seT, 


4"d(s,T/_1) 
g(4") 


< CST) .- 


In summary, the sequence 7’ is admissible and increasing and satisfies Sg(7’) < 
4Sa(T) and S7(T") < CS7(7). Therefore, replacing 7 by 7’, we now assume that 
the sequence (T7;,) increases. 

Let us consider the points z,,(t) as in the proof of Theorem 16.8.1. Since the 
sequence (T7;,) increases, we have m,(t) = t fort € T, andk >n. Givens,t € T, 
let us consider the largest integer m for which (5) = 2(t) and define 


(s,0) = 2max (> dGre(0), mer1()), D> dGre(s), rev1(8))) (16.167) 


k>m k>m 


It is straightforward to check that this defines an ultrametric distance. Moreover, 
we have d(s,t) < d(s,mm(s)) + d(t, Hm(s)) = d(s, 1m(s)) + d(t, Mm(t)), using 
the definition of m in the last equality. Furthermore, since tf = z,(t) for n large 
enough, the triangle inequality implies d(t, tm(t)) < ops A(R (L), TK+1(f)), and 
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the same inequality for s then yields that d(s,t) < a d(x (t), Wea) + 


ecm UES), Tet 1(5)) < 5(5, 2). 
Consider now t € T ands = z,(t) € Ty. Then mg(t) = mz(s) fork <n, and 
(s) = s fork > n. Consequently, the definition of 5 shows that 


6(t, Tn) < 2) d(m(t), Tk+1(t)) « (16.168) 


k>n 


Interchanging as usual the sums over k and n yields 


¥5 48, Tr) < Do d(re(t), e419) 924" < 2D) 4d (ret), me) - 


n>0 k>0 n<k k>0 


Denoting by 7* the admissible sequence (T,,), then (16.133) proves that Ss(7*) < 
LSa(T). 

Now, ift € T,+41, we have mz (t) = t fork > n+1, and thus (16.168) and (16.127) 
yield (t, Tn) < 2d(ttn41(t), Mn(t)) = 2d(An41(t), Tr) = 2d(t, T,). This implies 
that SS(7*) < 2S"(7). Oo 


The conclusion of Theorem 16.8.15 is not true without some kind of condition 
on g such as (16.164). A counterexample is provided in [104] in the case g(x) = x. 

Finally, we briefly investigate the extent to which we can improve (16.125) by 
requiring a stronger integrability condition on sup, , |Xs— X;|. For a Young function 
gy, and ar.v. X, let us define 


|Xllp = inf {u > 0; Ep(X/u) <1}, (16.169) 


so that the distance of (16.2) is simply ||X; — X;||y. It would be nice if we could 
replace the left-hand side of (16.125) by 


| sup 1X. — Kill 
5,tET 


but unfortunately this is not true. However, we have the following (which is a special 
case of a general principle, see [104]): 


Proposition 16.8.16 Assume that for a Young function w, we have 
x>@'D=1, y= 1S gry) = e@VO). (16.170) 
Then we may replace (16.125) by 


| sup [Xs — Xl, $ L(Sa(T) + S47) . (16.171) 
S,tE 
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In particular, we may improve the metric entropy bound (16.5) into 


A(T,d) 
|| sup |X, — Xl, 2 Lf gy |(N(T,d, €))de . (16.172) 
s,teT 0 


Proof Proceeding as in (16.130), we observe that for each number a > 0, we have 


sup |X+ — Xi <asup ) > 4"d(ttn1(0), t(1)) 


teT ps1 


“) , (16.173) 


+ > > b(n, o(— 


n>1séET, 
where we lighten notation by writing 
4" d(s, 6, xX, —X, 
b(n, s) = (s, On (s)) > Yn = [Xs On (5) 
p(4") d(s, On(S)) 
Let us define 
n,s(@) ms 
h(w) = infia > 0; > a b(n, s)p (22) SOSUT) + (16.174) 
n>1seT, 


so that obviously h(@) > 0. Using (16.173) for any a > h(w) implies 


sup |X; (@) — X(@)| < h(w) sup ) > 4"d(rtn—-1 (1), a(t) + 285(T) , 


teT eT n>] 


and recalling (16.133), it suffices to prove that ||h||y < L. Let us consider g(w) < 
h(@w). We deduce from (16.174) 


> oda, o( s(@ 2) SI. (16.175) 


n>1sETh (o) 


Recalling that g(1) = 1 (so that g(x) < 1 for |x| < 1) and that the sum of the 
coefficients b(n, s) for s € T, andn > lis Si(7), 


Yn,» (@) 
S- do ba, se ( ar. , )a Yns(@i<e@o} = SGT) 5 
n>1 seT, 
and combining with (16.175), 


yo btn, so(= 21 ee een e (16.176) 


n>1 séETh 
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Now, (16.170) implies that g(y/g(@)) < 9(y)/W(g(@)) for y = g(w) and 
g(@) = 1. Multiplying both sides of (16.176) by Ijg()>1; and using that 


Mn, (@)/2(@))1y,.,(@)|>e(o)} {ez S P(Mn,s(@))/W(g(@)), we obtain 


Le(oy=1}¥ (8) S4(T) < Y> SS dM, 8)9Mn,s()) . 


n>1sETy 


Since the expected value of the right-hand side is < S7(7), taking expectation 
implies El focwy>1)W(g(@)) < 1. Taking x = y = 1 in (16.170) proves that w(1) < 
1, so that Eljo(a)<1;W(g(w)) < 1 and then Ey(g) < 2. Taking g(w) = ah(w) with 
0 <a < 1 and letting a — 1 shows that Ew(h) < 2 so that Ey(h/2) < 1 and 
ally <2. o 


Condition (16.170) is essentially optimal, as the following challenging exercise 
shows: 


Exercise 16.8.17 Investigate the necessary conditions on the function w so that for 
any metric space and any process (X;);e7 that satisfies (16.1), one has 


A(T,d) 
|| sup |X, —Xillly < Lf gy (N(T, d, €))de . (16.177) 
s,teT 0 


Hint: Consider N and the space T of cardinality N where any two distinct points are 
at distance 1. Consider « < 1, and consider disjoint events (2;);e7 with P(Q;) = 
e/N. Apply (16.177) to the process (X;);er given by X; = og! (N/2€)19,. 
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We now briefly discuss the problem of the boundedness of processes that sat- 
isfy (16.1) in a general metric space, when the distance is not assumed to be 
p-concave (and in particular when d is the usual Euclidean distance on R”). In this 
case, there is a new phenomenon which takes place. In all the examples of chaining 
we have met up to now, the interpolation points 7, (t) converge geometrically toward 
t, but this feature is not always optimal. To understand this, consider a toy example, 
the unit interval with the usual distance. 


Proposition 16.9.1 Consider a process (X;1)t¢[0,1] that satisfies 
Vs,t € [0,1], E|X, — X;| < |s—t]. (16.178) 
Then 


E sip (X,—% <1. (16.179) 


O<s,t<1 
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Proof We have to show that if F C [0, 1] is finite, then E SUP, ser |Xs — Xr| < 1. 


Let F = {t},...,t,} withO <t) <...<t, < 1. Then 
E ge Xr, = Xt < E > Xtes4 = Xr | < > te+1 — te < ie oO 
bat I<te<n 1<t<n 


The following exercise shows that Proposition (16.9.1) cannot be deduced from 
Theorem 16.8.7: 


Exercise 16.9.2 Prove that if yz is a probability measure on [0, 1], then 


1 1 1 
i ar [ —————de = €& 
0 0 H(B(t, €)) 


The next exercise shows that the result of Proposition 16.9.1 cannot be explained 
by the size of the covering numbers of [0, 1]. 


Exercise 16.9.3. Denote by t = (f;);>1 the generic point of T = {0, 1. On T, 
consider the ultrametric distance 6 given by 5(s, t) = 2~‘+!, where i is the smallest 
integer for which s; ~ ¢;. Construct an unbounded process (X;) on (T, 5) that 
satisfies E|X, — X;| < 6(s,t) for each s,t € T. Compare the covering numbers 
of (T, 6) and ({0, 1], d) where d is the usual distance. 


Exercise 16.9.4 Review the proof of Theorem 16.8.1 to show that when g(x) = 
|x|, then one can improve (16.125) into 


E sup |X; — X:| < LS%(T). 


s,teT 


We consider now the case of the processes on T = [0, 1]?, provided with 
the usual distance d. Which are the Young functions @ such that all processes 
satisfying (16.1), Le., 


Vs,tET; Eg(Xs — X;) < d(s,t) 


are bounded? The covering numbers N(T,d,¢€) behave like €~?, so that (16.5) 
implies that it suffices that 


5 29 *@”) <0Oo. (16.180) 


n 


Theorem 16.9.5 ({104, Theorem 5.1]) Processes which satisfy (16.1) are all 
bounded if and only if @ satisfies the condition 


ye) eae, (16.181) 


n 


where g’ is the derivative of 9. 
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The difference between (16.180) and (16.181) can be seen clearly in the case 
p = | where (16.181) is automatically satisfied but (16.180) is not. What happens 
here is that as in Proposition 16.9.1, one can join two elements of T by a long 
chain of small steps (and this is not the case in the setting of Exercise 16.9.3). 
Theorem 16.9.5 is obviously more of theoretical than practical interest so we do 
not reproduce the specialized proof. 

This is however not the end of the story. The reason why the weak condi- 
tion (16.181) suffices for boundedness is a kind of “connectivity” in the structure of 
[0, 1]?. This connectivity structure does not exist when the distance is ultrametric, 
as Exercise 16.9.3 shows. There are also “intermediate situations” where both 
aspects are present, e.g., if one takes a product of [0, 1]? with a ultrametric space. 
Complicated necessary and sufficient conditions are found in [104] in such a 
case. This probably indicates that no simple complete description of the metric 
spaces for which condition (16.1) implies boundedness can be found, even in the 
“homogeneous situation” where covering numbers suffice. 


16.10 Notes and Comments 


Il y ales questions qui se posent, et les questions que l’on se pose.® Henri Poincaré 


Obviously, Poincaré had better mathematical taste than many subsequent mathe- 
maticians. My own view is that many of the problems considered in the chapter 
belong to the second category rather than the first, and I have included their solution 
in this book only because I find it excessively beautiful, in particular thanks to the 
work of W. Bednorz, who discovered a number of very clean and seemingly final 
arguments. A particularly important contribution of W. Bednorz is to have brought 
to light the technical importance of (16.12), after which everything becomes much 
easier. 

I undertook a systematic study of boundedness of stochastic processes under 
increment conditions using majorizing measure in [104]. This paper contains near- 
optimal results, but several arguments have been greatly simplified by W. Bednorz. 
I undertook this project despite the fact that I felt the topic to be of marginal 
importance, because I thought that I had no chance of making progress in the 
Gaussian case without having first mastered the elusive notion of majorizing 
measure. This strategy was successful. 

When the Young function g as in Sect. 16.8 has “polynomial growth” rather 
than “exponential growth’, it does not seem possible to characterize the size of T 
according to majorizing measures in terms of the size of the trees it contains, as we 
did in Sect. 3.1. 


8] won’t dare attempting a literal translation, but roughly this distinguishes between questions of 
self-evident importance and more arbitrary questions one may ask. 


Chapter 17 ® 
Shor’s Matching Theorem od 


17.1 Introduction 


This chapter continues Chap. 4, which should be fresh in the reader’s mind before 
attempting to penetrate the more difficult material presented here. In particular, the 
notion of “evenly spread” points is explained on page 121. The main result is as 
follows: 


Theorem 17.1.1 (P. Shor) Consider evenly spread points (Y;)i<n of (0, 1). Set 
Y¥y; = (7 vy), Consider i.i.d. points (Xj)i<n uniform over [0, 17, and set X; = 
(x}, xy, Then with probability > 1 — LN~"°, there exists a matching x such that 


Yo IX} = Yaa < L/N log N (17.1) 


i<N 


x2-y2 ner 172 
sup |X; — Olas we (17.2) 
i<N 


The power N~!° plays no special role. Theorem 17.1.1 improves upon Theo- 
rem 4.5.1 because when |X? _ Ye | < L./log N/N for each i, then Dien [ae — 
Yow < LV NIogN. 

A remarkable feature of Theorem 17.1.1 is that both coordinates do not play the 
same role. Following this idea, one may ask the following: 
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Research Problem 17.1.2 (The Ultimate Matching Conjecture) Prove or dis- 
prove the following. Consider a1, a2 > O with 1/a; + 1/a2 = 1/2. Then with high 
probability, we can find a matching z such that, for 7 = 1, 2, we have 


w IX? —¥2 yo 
Y e( aa ) <2N. (17.3) 
jeN log N L 


In Chap. 18, we shall prove a suitable version of the ultimate matching conjecture 
in dimension d > 3. Noting that 


) expaj < 2N = max |ai| < L(log N)!/4 
i< 
i<N = 


and that expa* > |a| shows that the case a} = a2 = 4 would provide a very 
neat common generalization of Theorems 4.5.1 and 4.7.1. Another case of special 
interest is the case “a, = 2, a2 = 00” for which one may interpret (17.3) for 7 = 2 
as meaning (17.2). Then (17.3) for 7 = 1 is very much stronger than (17.1). A 
partial result in the direction of this special case of Problem 17.1.2 is as follows: 


Theorem 17.1.3 Consider a number 0 < a < 1/2, an integer N > 2, and evenly 
spread points (Y;)i<n of (0, 1]?. Set Y; = eae r), Consider i.i.d. points (Xi)i<n 
uniform over [0, i. and set X; = ee x, Then with probability > 1 — Ly, 
there exists a matching m such that 


N IX} —Yral\* 
y-exp(,/———__"} <2N (17.4) 
oy log N K(a) 


sup |X? — ¥2()| < K(@),/ —— . (17.5) 
i<N 


Since exp |x|* > |x|/K (qa), (17.1) follows from (17.4), and thus Theorem 17.1.3 
improves upon Theorem 17.1.1. One may show that when a increases, the con- 
clusion of Theorem 17.1.3 becomes stronger. We do not know how to prove 
Theorem 17.1.3 fora > 1/2. 


Conjecture 17.1.4 Theorem 17.1.3 holds for a = 2. 


This is the special case “a; = 2, a2 = 00” of the ultimate matching conjecture and 
a nice research problem by itself. 

A central difficulty in the proof of a matching theorem is how to relate it to 
a suitable discrepancy theorem (here Theorem 17.2.1), and the most instructive 
part of the present section is how we pass this difficulty. There is however a lethal 
weakness in our approach to the discrepancy theorem. It is explained in Sect. 17.3. 
Until one finds a way to correct this weakness, no amount of technical effort is going 
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to succeed in reaching the value a = 2. For this reason, we only outline the proof 
of Theorem 17.1.1, and we refer the reader to [129] for the far more technical proof 
of Theorem 17.1.3. 


17.2. The Discrepancy Theorem 


The proof of Theorem 17.1.1 relies again on Proposition 4.3.2 and a “discrepancy 
theorem” of the same nature as (4.38), but for a more complicated class of functions. 
This is Theorem 17.2.1. It requires some preparations to state this discrepancy 
theorem. 

To prove Theorem 17.1.1, we may assume N large, and we do not care about 
what happens at a scale less than /log N//N. We will then replace [0, 1]* by 
a discrete approximation at that scale. More precisely, let us consider a universal 
constant L* which we shall choose later. Consider the largest integer p with 27? > 
L*./log N//N, so that when N is large enough, p < log N. This idea is to replace 
[0, 1]* by the set of points (k2~?, £2~”) for 1 < k, € < 2”. It is however pointless 
to carry the factor 2~? through our calculations, so that we re-scale this set: we 
consider the set G = {1,...,2?}*.! A generic point of G is denoted by t. Since 
G ={I,...,2?}*, we may also denote a point of G by its coordinates (k, 2) which 
are two integers between | and 2”. To each point, t = (k, £) of G corresponds a 
little square H; =](kK—1)27?, k2-?]x](€—1)2-?, £27? ] with sides of length 2-?, 
and these little squares form a partition of ]0, 1]*. We define “evenly spread” points 
(Z;)i<n of G as follows: we set Z; = t if Y; belongs to H,. Thus, denoting by Zi 
and ie the components of Z;, we have 


[2 ?Zr=¥ (22 Ps (2S ayied?, (17.6) 
We define 
n(t) =card{i<N; Z;=tT}, (17.7) 
so that? 
Sint) =N. (17.8) 
TEG 


Since 2~? is about L* /log N//N, N2~7? is about L** log N and hence large. 
Each square H; contains a large number of points Y;, and due to the fact that these 


' The notation G does not have the same meaning as in Chap. 4. Now the “grid” G is not a subset 
of [0, 1]*! 
? We assume of course that the points Y; belong to ]0, 1]*. 
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points are evenly spread, it should be obvious that this number of points is about the 
same for each square H,. That is, given N large, for a certain integer mo, we have 


Vr eG; mo <n(t) < 2m. (17.9) 
Summation of these inequalities over t together with (17.8) implies 
N27? emg SND? (17.10) 


Since 2~7? N is about L*? log N while p < log N, when L* is large, the ratio mo/p 
is large. We will prove that our arguments work when this ratio is large enough, 
which can be achieved by taking L* large enough. 

In our discrete approximation, the points Z; replace the points Y;. Now we have 
to construct the random points U; valued in G which replace the points X;. The 
obvious procedure (to define U; = t when X; € H,) is not the correct one,’ which 
we describe now. By definition of an evenly spread family, each of the points Y; 
belongs to a little rectangle of area 1/N. We denote by K, the union of these little 
rectangles for which Y; € H; so that K; is of area n(t)/N. We consider the G- 
valued r.v.s U; such that U; = t when X; € K,. Thus, the r.v.s U; are i.i.d. with law 
jt, where the probability measure jz on G is given by 


a) 17.11 
w({t}) = me (17.11) 


Also, when U; = t = (k, £), we have by definition that X; belongs to the small 
rectangle associated with a point Y; ¢ H;. Then 


L 
2 2 
: [Apts 


JN 


Ix! — yl) < 
v= IN 
and 
ae oe eg ms ia fe ae 
and combining these, we have 


|2-Pu} — X}| < L2-?; |2-?u? — x?) < 12? . (17.12) 


For a function h on G, we have 


[rs = - > n(t)h(t) . (17.13) 


tEG 


3 We do not want that P(U; = t) = 277? but rather P(U; = t) = n(t)/N. 
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We consider the class H. of functions h : G — R such that 


ps |h(k, 2+ 1) —h(k, £)| < 27? (17.14) 
1<k<2P ,1<€<2?-1 


Vk,£, |A(k+1,0)—hk, | <1. (17.15) 


To lighten notation, we will write (17.14) as > |h(k, + 1) — hk, £)| < fis 
and we will not mention any more that it is always understood that when a quantity 
such as h(k, £+1) —h(k, £) occurs in a summation, we consider only the values of 
£ with € + 1 < 2?. Ina similar manner, when the quantity |h(k + 1, 2) — A(k, £)| 
occurs in a condition, it is always understood that we consider only the values of k 
for which k + 1 < 2?. 

It will become clear only gradually that the class H of functions is related 
to a matching problem. Let us say however that the weak restriction (17.14) is 
related to the fact that we ask the strong condition (17.2) on the second coordinates, 
whereas the strong restriction (17.15) is related to the fact that we ask only the weak 
condition (17.1) on the first coordinates. 

The central ingredient of our approach is the following: 


Theorem 17.2.1 Consider independent r.v.s U; valued in G, of law wu. Then, with 
probability > 1 — exp(—46p), we have 


WheH, 


Law = | aw) < L/pmg?’? . (17.16) 


i<N 


We shall explain soon how to turn this type of result into a matching theorem. 
The larger the class H in (17.16), the better the matching theorem one gets. It is 
therefore a natural question to wonder for which classes of functions a result such 
as Theorem 17.2.1 might be true. 


Research Problem 17.2.2 Consider two functions 6; (x) > x, 62(x) > x. Consider 
the class H of functions h : G — R such that 


Yi AMhK+ 1, Oh, ON) + D> O(ln(k, +1) Ak, <2? 17.17) 


What are the conditions on 6; and 62 so that 


E sup | Yi awi) — f raw) < K./pimo 2°? (17.18) 


heH i<N 


for a constant K independent of p? 


Of particular interest is the case 0) (x) = x(log(3+.)) 1/2 and 62(x) =x. A positive 
answer (and significant extra work) would allow one to prove Conjecture 17.1.4. 
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We shall outline the proof of Theorem 17.2.1 in the next section, but we first state 
the matching theorem it implies. In the following statement, we denote by U i and 
U ? the components of U; and similarly for Z;: 


Theorem 17.2.3 There exists a number Lo with the following property. Recalling 
the number mg of (17.9), assume that 


p<—. (17.19) 


Consider points (Zj)i<n in G, and assume that for each t € G, we have card{i < 
N; Zi; = t} = n(t) = Nu({t}). Consider points (Uij)i<n in G, and assume 
that (17.16) holds. Then we can find a permutation 1m of {1,..., N} for which 


YU -Zawl SN. (17.20) 
i<N 
ViSN,|U7-Z2l<1. (17.21) 


It is unimportant to have N rather than LN in (17.20). 


Proof of Theorem 17.1.1. We consider the points Z; of G given by Z; = t when 
Y; € H, and the random points U; in G given by U; = t when X; € K; 
(the union of the little rectangles associated with the points Y; € H;). As we 
have already observed just below (17.10), when L* becomes large, the ratio mo/p 
becomes large, so that if L* is large enough, then (17.19) holds. Also since p is 
about L* log N, for L* large, we have exp(—46p) < N7!° so that according to 
Theorem 17.2.1, (17.16) holds with probability > 1—N!°. When this is the case, and 
since 2~? < L./log N//N, it follows from (17.6) and (17.12) that the permutation 
x which satisfies (17.20) and (17.21) also satisfies (17.1) and (17.2). oO 


Beginning of the Proof of Theorem 17.2.3. The first steps of the proof look canonical. 
Let us define 


M, = sup ) (wu; +uh), (17.22) 
i<N 


where the supremum is taken over all families (w;), (wi) for which 
Vi, j<N,|U7- Zi) <l>uj+wi <|U} -Zj/. (17.23) 


We first claim that there is a permutation 2 which satisfies (17.21) and for which 
view ia? - Za < My,. To see this, let us consider a number c > O, and 
let us define cj; = ber — Zil when |U? a Z;| < land c;,; = c otherwise. 
Consider then numbers (w;)j<y and (w})j<y such that w; + w’. < cj,; for each 


i, j < N. Then (17.23) holds so that by definition of M,, we have ey wi + w; < 
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My. It follows from Proposition 4.3.2 that there is a permutation z such that 
Vien Ciz(i) < My). If one takes c = 2M), this shows in particular that no 


2@| = land 


term in the sum equals c, so that for each term, we have |U? — Zz 


Ci,x(i) = |U} - Znel This proves the claim. 
The idea now is, considering a family (w;), to define the function h’ on G given 
by 


W'(k, ©) = min {Ik — Zj] — ws |e Zj] = 1}. 


Rewriting (17.23) as 
sas 2 2 1 1 ; 
Vi,j < N, |U; -Zi|<1l>wj <U; Fh Weg 
we have h’(U;) > w;, and thus (17.22) implies 


M, < )o(h'(U;) + v}) . (17.24) 
i<N 


This construction is a bit clumsy, because given tT € G, there are quite a few values 
of i for which Z; = t. A moment thinking shows that one can only increase the left- 
hand side of (17.24) if one replaces the corresponding values of w’ by their average 
(over all the values of i for which Z; = t). To pursue this idea, given numbers 
(u(T))reG, we define the function / on G given by 


h(k, €) = min {|k —r|+u(r,s); (45) €G,e—-s| <1}. (17.25) 


For t = (k, €) € G, let us then choose 


u(t) = ai Yo tw be Sh (17.26) 
so that 
Yo wh =— onus). (17.27) 
i<N,Zj=t TEG 


The infimum of the numbers —w/ for Z; = T is less than their average so that given 
t and j with Z; = t, we can find j’ with Zj = Z; = t and —wi, < u(t). 
Comparing the definitions of h and h’ proves that h’ < h. Consequently, (17.24) 
and (17.27) imply 


M, < > h(U;) — x n(t)u(T) (17.28) 


i<N TEG 
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Using (17.13) and (17.28), we get 


M < | raw — fray] — SF n(t) (u(t) — h(z)) . (17.29) 


i<N TEG 


The hope now is that 


> n(t)(u(t) — h(t)) small = the function h behaves well, (17.30) 
tEG 


so that we may have a chance that with high probability sup, | }°;-y(4(Ui) — 
f hdjz)| (where the supremum is taken over all functions / arising in this manner) 
is small, and consequently the right-hand side of (17.29) has a chance to bounded. 
The difficulty (which is generic when deducing matching theorems from Proposi- 
tion 4.3.2) is to find a usable way to express that “h behaves well’. In the present 
case, this difficulty is solved by the following result: 


Proposition 17.2.4 Consider numbers u(k, €) for (k, 0) € G = {l,...,2?}°, and 
consider the function h of (17.25), i.e., 


h(k, 2) = inf {u(r, s)+|k—-rl; (s) eG, |\€-s|< 1} : (17.31) 
Then 
Vkil, |h(k+ 1, £) — hk, &| <1 (17.32) 


and, assuming (17.19), 


mo > |hA(k,€+1)—h(k, | < L > n(t)(u(t) — h(t)) . (17.33) 


k,£ tTEG 


So, when the left-hand side of (17.30) is small, h behaves well in the sense 
that (17.32) holds and that mg eed |h(k, 2 + 1) — h(k, £)| is also small. This 
is what motivated the introduction of the class H and of (17.14). The proof of 
Proposition 17.2.4 is elementary and is rather unrelated with the main ideas of this 
work. It is given in Sect. B.6. 


End of the Proof of Theorem 17.2.3. We have to show that provided the constant 
Lo of (17.19) is large enough, then the right-hand side of (17.29) is < N. We define 


B=2?S "hk, £+1) — hk, | (17.34) 


and B’ = B + 1so that B’ > 1 andh/B’ € H. Then (17.16) implies 


Daw) - / hd) < L./pmo2??B' , (17.35) 


i<N 
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whereas (17.33) and (17.34) imply 


mo mo22? 
—h > h(k,£€+1)—htk, £)| = B. 
Den(t)(u(t) — h(®)) = = Dh, £+ D — AK, Ol = 
Combining with (17.29) and (17.35), we get, since B’ = B + 1, 
2p pr 052 
M, < L./pmo2°?B air Aa PB 
2 BYP( L/D - ”) 4 Lif pmig22? (17.36) 


using that B’ = B + | in the last inequality. Consequently, if the constant Lo 
in (17.19) is large enough, the first term is negative, so that (17.36) implies as desired 
that M, < L./pmo2°? < mo2?? < N using (17.19) for an appropriate choice of 
Lo and (17.10). Oo 


17.3. Lethal Weakness of the Approach 


We turn to the proof of Theorem 17.2.1. The main difficulty is to control y2(H, d2), 
where dp is the distance in £2(G) or, equivalently, the Euclidean distance on RS. 
Ideally, this should be done by using a suitable functional and Theorem 2.9.1. 
However, the functional can be discovered only by understanding the underlying 
geometry, which the author does not. 

How should one use condition (17.14)? To bypass this difficulty, we will replace 
this condition by the more familiar Lipschitz-type condition 


|n(k, +1) —h(k, | <2/. (17.37) 


More precisely, we consider the class 1 consisting of the functionsh : G > R 
such that 


Vk, €, |n(k +1, 0) —h(k, &)| <1; Jak, C41) Ak, O| <1. (17.38) 


Given an integer j > 2, for a number V > 0, we consider the class H;(V) of 
functions h : G — R such that 


Vk, €, |A(k +1, €) —h(k, &| <1, |A(k, €+1) — hk, €)| < 2/ (17.39) 
card{(k, £)€ G; h(k, CL) AO} <V. (17.40) 


The key to our approach is the following, which is proved in Sect. B.5: 
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Proposition 17.3.1 [fh € H, we can decompose 


h = ) “hj where hy € LH and hj € LH; (2??~/) for j > 2. (17.41) 
jzl 


Before we analyze the classes Hj; := Hj (2??—J), let us reveal the dirty secret. 
To prove (17.16), we will deduce from (17. 41) the inequality 


sup | Dmw - f hau < < Ly sup [onus — f tn . (17.42) 


heH i<N j2l heH ; i<N 


That is (Heaven forbid!), we bound the supremum of a sum by the sum of the 
suprema.* This entails a loss which seems to prevent reaching the correct value 
a = 2 in Theorem 17.1.3. 

Let us now try to understand the size of the classes H;. In (17.39), the first 
and second coordinates play a different role. The continuous equivalent of this is 
the class of functions f on the unit square which satisfy | f(x, y) — f(x’, y)| < 
|x — x’| and | f(x, y’) — f(x, y)| < 2/|y’ — y|. The function T(f) given by 
T(f)(x, y) = f(U(x, y)) where U(x, y) = (24/2x,274/"y) is basically 2//?- 
Lipschitz, whereas U preserves Lebesgue’s measure. Thus, one should think (17.39) 
means “the function is 2//*-Lipschitz”. On the other hand, it turns out (even though 
this is not obvious at this stage) that condition (17.40) creates a ccsaui 2-/. Thus, 
the “‘size of the class H; should be 2~/ '/? the size of the class H;”. We thus expect 
that the right-hand side of (17.42) will converge as a geometric series, therefore 
requiring the level zero of sophistication. 

The central step in the proof of Theorem 17.2.1 is as follows: 


Proposition 17.3.2 Consider 1 < ky < ko < 2?,1< | < 2 < 2? and R= 
{k,,..., ko} x {@1,..., €2}. Assume that 
2p -—0;4+1=2 4 (ko —-ky +1). (17.43) 


Consider independent r.v.s U; valued in G, of law tt. Then, with probability at least 
1 — Lexp(—50p), the following occurs. Consider any function h : G —> R, and 
assume that 


h(k, 2) =0 unless (k, CL) ER. (17.44) 

(k,O), (kK+1OH€ R= |h(kK+1,6) —hkk, 2)| < 1 (17.45) 
(k, 0), k€+I)ER= |h(k,€+ 1) —hk, | < 2/ (17.46) 
V(k,£) ER, |htk, £)| < 2(k2 — ki). (17.47) 


4 Just as what happens in the traditional chaining based on entropy numbers. 
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Then 


| Yaw) = f aw) < L2//? /pmg card R , (17.48) 


i<N 
where mo is as in (17.9) and (17.10). 


In words, R is a rectangle of the appropriate shape (17.43), reflecting the fact 
that the two coordinates in (17.39) do not play the same role. The function h is 
zero outside R, and its restriction to R satisfies the conditions (17.45) and (17.46). 
This however does not suffice to obtain a control as in (17.48), as is shown by the 
case where h is constant on R, so that a mild control on the size of h is required as 
in (17.47). 

Besides the fact that we do not assume that h is zero on the boundary of R (and 
require a stronger control than just in expectation), Proposition 17.3.2 is really a 
discrete version of Proposition 4.5.19. Therefore, as expected, no really new idea 
is required, only technical work, as, for example, replacing the use of the Fourier 
transform by a discrete version of it (a technique which is detailed in Sect. 4.5.2). 
For this reason, we have decided not to include the proof of Proposition 17.3.2.° 

We will also use the following, in the same spirit as Proposition 17.3.2, but very 
much easier. It covers the case of “flat rectangles”. 


Proposition 17.3.3 Consider 1 < ky < kz < 2?,1 < €) < 2? and R = 
{kj,..., ko} x {lo}. Assume that 
poaeeteo, (17.49) 


Consider independent r.v.s U; valued in G, of law wt. Then, with probability at least 
1 — Lexp(—50p), the following occurs. Consider any function h : G —> R, and 
assume that 


h(k, £2) =0 unless (k, 0) ER, (17.50) 
(k,0), (kK+1Q€ RS |hk+1,0—-h(k, Ol <1, (17.51) 
Vik, £)ER, |Atk, £)| < 2(ko — ky). (17.52) 


Then 


| Dnw= f haw) < Limo(ko—k +1)?” < L2//* /mo card R. (17.53) 


i<N 


5 Most readers are likely to be satisfied with a global understanding of Shor’s matching theorem and 
would not read these proofs. The exceptional reader willing to have a run at the ultimate matching 
conjecture should figure out the details herself as preparatory training. Finally, those really eager 
on mastering all the technical details can find them in [132]. 


560 17 Shor’s Matching Theorem 


Proof of Theorem 17.2.1. In Proposition 17.3.2, there are (crudely) at most 24? 
choices for the quadruplet (k1, k2, £1, £2). Thus, with probability at least 1 — 
Lexp(—46p), the conclusions of Proposition 17.3.2 are true for all values of 
ki, ko, £1, and £2, and the conclusions of Proposition 17.3.3 hold for all values of 
k,, ko, and £9. We assume that this is the case in the remainder of the proof. Under 
these conditions, we show the following: 


heH >| Y\ aw) - [raw < L./pmig 2”? . (17.54) 


i<N 


ketj=He? ) =| Daw) f haw) < L./pmo2?P-4/? , (17.55) 
i<N 


The conclusion then follows from the decomposition (17.41) of a function in H 
provided by Proposition 17.3.1. 


The proof of (17.54) relies on the case kj = €; = 1 and ky = €2 = 2? 
of Proposition 17.3.3. The function h satisfies (17.45) and (17.46), and hence 
|h(k, £2) —hd, 1)| < 2P+1 _ 2 for each (k, £2) € G. Consequently, the function 
h*(k,£) = hk, €) — hd, 1) satisfies (17.44), (17.45), (17.46), and (17.47). 
Therefore, h* satisfies (17.54), and consequently this is also the case for h. 

Next, we turn to the proof of (17.55), and we fix j once and for all. The method 
is to find a family 7 of rectangles with the following properties: 


¢ The rectangles in 7 all have the appropriate shape to apply either Proposi- 
tion 17.3.2 or Proposition 17.3.3. 

¢ The family 7e is disjoint and covers the support of h. 

¢ The sum of the cardinalities of the rectangles in R is at most eight times the 
cardinality of the support of h. 

¢ Each of the functions h1r for R € F satisfies either (17.47) or (17.52). 


We then write h = yRerR h1p, and we apply either (17.48) or (17.53) to each 
term to obtain the desired result. To understand the idea of the construction of the 
family 7, one can start with an exercise. Denoting by A Lebesgue’s measure on the 
unit square, the exercise is to prove that a measurable subset A of the unit square 
with A(A) < 1/8 can be covered by a union of disjoint dyadic squares such that for 
each square C in this family, one has A(C)/8 < A(ANC) < A(C)/2. To see this, one 
recursively removes (starting with the larger squares) the dyadic squares C for which 
A(CNA) => A(C)/8. The condition A(ANC) < A(C)/2 is an automatic consequence 
of the fact that the dyadic square four times the size of C and containing C has not 
been previously removed. The point of this condition is to ensure that C is not 
contained in A. In the construction of the family ?, this will ensure that / takes the 
value zero for at least one point on each square R, so that since it is 1-Lipschitz, 
we may bound it as required by (17.47) or (17.52). The unsurprising details of the 
construction may be found in Sect. B.4. 


Chapter 18 ®) 
The Ultimate Matching Theorem od 
in Dimension 3 


18.1 Introduction 


In this chapter, we continue the study of matchings, but in dimension d = 3 rather 
than 2.! We consider i.i.d. r.v.s (X;)i<n uniformly distributed over the set [0, tT, 
We want to match these points to nonrandom “evenly spread” points (Y;);<y.” Here, 
we say that (Y;);<y are evenly spread if one can cover [0, 1]* with N rectangular 
boxes with disjoint interiors, such that each box R has a three-dimensional volume 
1/N, contains exactly one point Y;, and is such that R Cc B(Y, 10N71/3), Each 
point of [0, 1]? belongs to such a box R and is within distance 10N~!/? of a point 
Vis 

The plan is to prove that for as large as possible a function g, with probability 


close to 1, there exists a permutation z of {1,..., N} such that 
1 Xi — Yri(i) 
we pyane) =? oo 
i< 


where N~!/? is the scaling factor which is appropriate to dimension 3. For example, 
one might consider a function such as g(X) = expd(X, 0)%, where d(X, 0) is the 
distance between X and 0. Then (18.1) implies that this distance between X; and 
Y,(i) 18 typically about N ~1/3 and gives a precise control on the number of indexes 
i for which it is significantly larger. 

Let us try to explain in words the difference between the situation in dimension 
3 and in dimension 2. In dimension 2, there are irregularities at all scales in 
the distribution of a random sample (X;)j<y of [0, 172, and these irregularities 


' No new ideas are required to cover the case d > 3. 


> The reader may review the beginning of Sect. 4.3 at this stage. 
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combine to create the mysterious fractional powers of log N. In dimension 3, no 
such phenomenon occurs, but there are still irregularities at many different scales. 
Cubes of volume about A/N with a dramatic deficit of points X; exist for A up to 
about log N. The larger A, the fewer such cubes. The essential feature of dimension 
> 3 is that, as we will detail below, irregularities at different scales cannot combine. 
Still there typically exists a cube of side about (log N/N)!/? which contains no 
point X;. A point Y; close to the center of this cube has to be matched with a point 
Xj; at distance about (log N/N)I. Thus, if g satisfies (18.1) and is a function 
g(X) = f(d(X, 0)) of the distance of X to 0, the function f cannot grow faster 
than exp x?. 

We may also consider functions y for which the different coordinates in R? play 
different roles. It is then the scarcity of points X; inside certain rectangles which 
will provide obstacles to matchings. 

Although this is not obvious at this stage, it turns out that the important 
characteristic of the function g is the sequence of sets ({g < Ny})n>1. We will 
assume that these sets are rectangles with sides parallel to the coordinate axes, 
and we change perspective; we use these rectangles (rather than @) as the basic 
object. To describe such rectangles, for each k > 0, we consider three integers 
nj(k) = 0,1 < j < 3 with the following properties: Each sequence (nj (k))x>0 1s 
non-decreasing and 


Sinjk) =k. (18.2) 
i33 
We define 
Se = | [[-2", 2%}, (18.3) 
a) 


so that, denoting by A the volume measure, 4(5;,) = 2+? by (18.2). Thus, to go 
from Sx; to S%41, one doubles the size of one side, not necessarily the same at each 
step. We note for further use that 


Ska1 C28, . (18.4) 


Recalling our notation No = 1, Ny = 22" for k > 1, let us then define a function @ 
by 


p(x) = inf{N, ; x © Sx}, (18.5) 


so that p(x) = Lif x € So, g(x) = wifx ¢ Uxso Sx and p(x) = Nz if x € 
Sx \ Sz—1. Also (and this motivated the construction), we have 


{p< Mh = S&. (18.6) 
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Thus, the function g depends on the three sequences of integers (n;(k))x>0, but the 
notation does not indicate this. 


Theorem 18.1.1 Consider the function g as above. Then with probability > 1 — 


LN~!°, there exists a permutation 1 of {1,..., N} such that 
1 Xj = Yx(i) 
wo yen) ae”) 
is 


Before we discuss this result, we state an elementary fact, which will be used 
during this discussion. 


Lemma 18.1.2 Assume that [0,1]}° is divided into sets of equal measure < 
log N/(2N). Then if N is large enough, with probability close to one (and certainly 
> 3/4), there is one of these sets which contains no point X;. 


Informal Proof. The probability for any one of these sets not to contain a point X; 
is at least (1 — log N/(2N))% ~ 1//N. There are more than 2N/ log N such sets 
and 1/./N x N/log(N) > 1. The result follows if we pretend that these events 
are independent because for independent events ({2;);<x, the probability that none 
of the events occurs is Ie.0 — P(Q;)) < exp(— ae P(92;)). The assertion that 
the event is independent is rigorous for a Poisson point process, and it suffices to 
compare the actual process with such a process. Oo 


Let us now argue that Theorem 18.1.1 is sharp. Denoting by A Lebesgue’s 
measure, the function ¢ satisfies A({g < Nx}) = ACS.) = 2'+3 | This condition 
restricts the growth of the function g. We will show that it is basically necessary. 


Proposition 18.1.3. Assume that @ is of the type (18.5), but without requiring that 
Vv i<3 nj(k) = k. Assume that for a certain number C and all N large enough with 
probability > 1/2, we can find a permutation such that 


“ (aw a) <2, (18.8) 


Then for every k large enough, we have ({g < Nx}) = 2*/LC?. 


Proof Without loss of generality, we may assume that the number C in (18.8) equals 
2"0 for a certain no > 1. Let us then fix k andn > 2 + no + maxj<3nj;(k), So 
that the set 2~"*"0S; is a rectangle of the type TT]je3[-2-™, 2~"i] where mj = 
n—no —nj(k) > 2. Then we may divide [0, 1}? into sets Ag which are translates 
of 2-"+"05,. The sets Ag are of measure a := A(27"*"0S$,) = 27-3"+370)(5;). 
Consider then N = 2?"+3, so that N—!/3 = 2-"—!. The sides of the rectangles Ag 
are 27+] — Q-atnotaj+l — N—-1/3Qn0+nj®+2 TF (¥;);<y are evenly spread, 
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for each set Ag, there is a point Ye within distance 1ON —1/3 of its center, and this 


point is well inside of Ag, say, if Ag = ce +27" Sx, then Ye € cp +27" 0-1 §), 3 
Let us assume if possible that 


gN 


lo 
= 2~3"+3n0) (§,) < 18.9 
a (Sk) < an (18.9) 


Applying Lemma 18.1.2, then with probability > 0, there will at the same time 
exist a matching as in (18.8), and one of the sets Ag will not contain any point 
X;. The corresponding point Ye € ce + 2-"t+no-l 5) near the center of Ay can 
only be matched to a point X; ¢ Ag = ce + 2-"*"S,. Then Ye — Xi ¢ 
Q-ntno-1¢, — 220N—1/3S, so that (Ye — X;)/(2"°N-"/3) ¢ Se = {o < Ng} 
and thus y((Y~ — X;)/2"°N~1!/3) > Nx. On the other hand, since X; and Y, are 
matched together, (18.8) implies that g((Y; — X;)/2"N-"/3) < 2N. In particular, 
we have shown that Ny < 2N. Turning things around, when 2N < Nx, (18.9) 
must fail, that is, we have log N/(2N) < 273”+370,($,) = 23"0+3)(5;)/N, and 
thus logN < 23"L)(S,) < LC*A(S,). Choosing n as large as possible with 
2N = 23"+4 < N, yields Ne < 4N and 2‘/L < logN < LC?A(S,) which is 
the desired result. oO 


Exercise 18.1.4 Consider a convex function y > 0 on R°?, with w(0) = 0, which 
is allowed to take infinite values. Assume that it satisfies the following: 


Vu>1, A({W <u}) > logu, (18.10) 
€1,€2,€3 =] > (E11, €2%2, €3x3) = WI, X2, x3), (18.11) 
w(,0,0)<1; w0,1,0) <1; ¥O,0,1) <1. (18.12) 


Then there are a constant Z and a function g as in Theorem 18.1.1 with w(x) < 
y(Lx). Hint: All it takes is to observe that a convex set invariant by the symmetries 
around the coordinate planes is basically a rectangular box with sides parallel to the 
coordinate axes. 


As a consequence of this exercise, Theorem 18.1.1 also applies to such functions. 
Consider, for example, a1, a2, a3 €]0, co] with 


and the function 


1 
vr(x1, X2, x3) = exp gc ap (tal eg yt. 


3 Note that without loss of generality, we may assume no > 10 to give us all the room we need. 
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Here, we define |x|°° = 0 if |x| < 1 and |x|°° = oo if |x| > 1. Then 
{W <u} D {(1, 42,43); Vi <3, |xjl < Cog + u))/*}, 
and consequently, 
Ay < uj) = log +u). 


Thus, Theorem 18.1.1 proves in this setting the “ultimate matching conjecture” of 
Problem 17.1.2. 

The special case a] = a2 = a3 = 3 is essentially the case where w(x) = 
exp(||x||>). It was proved earlier by J. Yukich using the so-called transportation 
method (unpublished), but the transportation method seems powerless to prove 
anything close to Theorem 18.1.1. This special case shows that with probability 
>1—LN7!9, we can find a matching for which 


Y_ exp(Nd(Xi, Yn(iy)°/L) < 2N , 
i<N 


so that in particular Di<n d(Xi, Vag) < L (since x < expx) and for 
each i, exp(Nd(Xj, ¥n(@))?/L) < 2N, which implies maxj<y d(X;j,Yxii)) < 
LN7~'/3 dog N)!/3 (a result first obtained by J. Yukich and P. Shor in [97]). 


Research Problem 18.1.5 Find a proof of Theorem 18.1.1 a few pages long. The 
current proof occupies the entire chapter. 


18.2 Regularization of g 


For purely technical reasons, we will not be able to work directly with the function 
~, so in this section, we construct a regularized version of it. Our goal is to prove 
the following: 


Proposition 18.2.1 There exists a function g* with y*(0) = 0 that satisfies the 
following properties: 


VK>0, 88x C {y* < Ng} C 16S; , (18.13) 
the set {p* < u} is convex for eachu > 0, (18.14) 
Vx, o* (x) =" (-x), (18.15) 

UE Ns slo" suc [e" < 4}. (18.16) 
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The crucial new property of y* compared to g is (18.16). Please note that 
condition (18.14) does not say that y* is convex. 

Let us start some auxiliary constructions. For each j < 3, we have nj;(k) 
nj(k+ 1) <n;(k) +1. Thus, the sequence (n ;(k))x>0 takes all the values 0 <n 
nj <= sup, nj(k) € NU {oo}. Forn < nj, n EN, we define 


IA IA 


kj(n) = inf{k ; nj(k) =n}. (18.17) 


In particular, k ; (0) = 0, and, as will be of constant use, k;(n + 1) > kj(n) + 1. We 
then define a function 6; : Rt — R* by the following properties: 


0<1<2P>06;H=0. (18.18) 

n<ni> 0;(2"*9) = log Nj - (18.19) 

6; is linear between 2"*? and 2"** forO <n <n¥. (18.20) 
t > 2"*3 = 6;(t) =00. (18.21) 


Obviously, this function is non-decreasing. 


Lemma 18.2.2 Fork > 0, we have 
[0, 27+3) c {0; < log Ny} C [0, 27OT4 (18.22) 


Proof Let n = n;(k) so that by definition of k;(n), we have k(n) < k < kj(n+1), 
and by (18.19) 6;(2"*3) = log Ng,(n) < log Nx so that [0,273] Cc (0; < 


log Nx}. Next, for t > ar, by (18.19) again, we have 6;(t) = log Nk (n+1) > 
log Nx so that {0; < log Ng} c [0, 2”/+4], and (18.22) is proved. o 


Lemma 18.2.3 Fort € R*, we have 
O;(t) = log N4 + 2log2 = 6;(3t/4) < 6;(t) —2log2. (18.23) 


Proof A first observation is that for n > 1, the slope of 6; on the interval 


[2"+3 2"+4] is (recalling that log Ny = 2* log2 for k > 2) 
2-"-3 log Ng, (n41) — log Ng ,(ny) = 27" 9 (2D — 24) log? . 


A technical problem here is that there seems to be no reason why this quantity would 
increase with n. On the other hand, since k;(n) < kj(n +1) —1, this slope is at least 


x(n) := log2 x 2+ D—a-4 | (18.24) 


18.2 Regularization of 567 


which satisfies x(n) < x(n + 1). As a consequence, the slope of 6; on the interval 
[2”+! oof is at least x (n). To prove (18.23), we may assume 6; (31/4) > log Nq for 
the proof is finished otherwise. Thus, {@; < log N4} C [0, 3/4], and from (18.18), 
we have 3t/4 > 23. Consider the largest integer n* with oats 2 3t/4, so that 
3t/4 < 2" +4 and thus 


log N4 < 0; (3t/4) < ;(2" *4) = log Ne, n*41) » 


and hence kj(n* + 1) > 4. Since 3t/4 > 2""+3, we have t/4 > 2"+!, so that 
t — 3t/4 =1t/4 > 2”'+!. As we noted, the slope of 6; on the interval [2”"+3, oo] is 
everywhere at least x (n*). Thus, we have proved that 


6;(t) — 0;(3t/4) > 2+! y (n*) = log2 x 2H O"+D-3 > 210g2. Oo 


Proof of Proposition 18.2.1. We define yj (t) = |t|/8 for |t| < 1 and wj(t) = 
exp 6; (|t|) for |t| => 1. We define 


gp" (x1, x2, x3) = max vj). (18.25) 


so that (18.13) follows from (18.22) and the equality {g* < u} = [jest 
which also proves (18.14). Certainly, (18.15) is obvious. 

We turn to the proof of (18.16). Consider u > N5. Given x with g*(x) < u, we 
have to prove that y*(3x/4) < u/4. Since g*(x) < u, we have max ;<3 6; (|x;|) < 
log u so that, since log u > log Ns, we have 


< uj, 


max(log Ns, max 8; (|x;|)) <logu. (18.26) 
i393 


Now, since logN5 = 2°log2 > 4log2 + 2*log2 = 4log2 + log N4, when 
6; (|xj|) < log N4 + 2 log 2, then 


9j(3|xj1/4) < Oj (\xj|) < log Ns — 2log2, 


where we use that 6; is increasing in the first inequality. Now, if 0; (|x;|) = log Na+ 
2 log 2, then (18.23) implies that 0 ;(3|x;|/4)) < @;(|x;|) — 2 log 2. Consequently, 


log g* (3x/4) = ma OO eal < max(log Ns, peeled — 2log2, 


and using (18.26), we obtain log g* (3x/4) < logu — 2log2 = log(u/4). Oo 
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18.3 Discretization 


We now think of N as fixed and large. Consider a universal constant L* which will 
be determined later. We define p as 


the largest integer such that L*BP <N, (18.27) 


so that p > 2 for N large. Let G = {I,...,2?}°. We denote by 1 the 
generic element of G. To each t = (tj)j<3 corresponds the small cube H; := 
TT j<3]12-? (rj — 1),27?t;] of side 2~? of which 27? is a vertex. These cubes 
form a partition of ]0, 1]?. The idea is simply that since we are not interested in 
what happens at scales less than N~!/3, we replace H, by the point 2~?t, and G 
is a discrete model for [0, 1]° (although we have to keep in mind the scaling factor 
2-P), 

Recalling our evenly spread points Y;, we define Z; = t where t € G is 
determined by Y; € H,. We note that each coordinate of Y; differs by at most 
2~? from the corresponding coordinate of 2~? Z;, which for further use we express 
as (and recalling that Sy = [—1, 1]? by (18.3)) 


2PY; —Z; € So. (18.28) 
We set 
n(t) =card{i< N; Z; =t}=card{i< N; Y; € Hr}. (18.29) 
Lemma 18.3.1 Jf L* is large enough, there exists an integer mp > L* /2 such that 
VteG, mo <n(t) < 2mo (18.30) 
and 
mo2°? < N <2mo2?? . (18.31) 
Proof We first observe that (18.31) follows from summation of the inequali- 
ties (18.30) over t € G. 

Since the points Y; are evenly spread, there exists a partition of [0,1]? in 
rectangular boxes R; with Y; « R; C B(%, 10/N'/3). Each of these boxes has 
volume N~!. Let us fix t € G. Let W, be the union of the R; such that R; C H;, and 
observe that NA(W}) is just the number of these boxes Rj C H;. When R; C Wj, 


we have Y; € H;, so that 


NiA(W) < card{i < N; Y; € Hy} =n(t). (18.32) 
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Let W2 be the union of the R; such that R; N H; 4 @, so that W2 C H,. When 
Y; € H,, we have Rj 1 H, #4 so that Rj C W2 and 


n(t) = card{i < N; Y; € Hz} < NA(W2) . (18.33) 


On the other hand, we have 4(W,) < A(Az) = 2-3P < X(W2) so that NA(W)) < 
N2-3P < NA(W2), and combining with (18.32) and (18.33), we obtain 


|n(t) — N2~3?| < NA(W2 \ Wi), (18.34) 


where W3\ W is the union of the boxes Rj for which R; 1H, 4 Band RiNHE F VG. 
Since R; is of diameter < 20N =u A every point of W2 \ W; is within distance 
20N—'/3 of the boundary of H;. Since L*23? < N, we have N7!/3 < 2-?(L*)7 1 
so that when L* is large, we have 20N~!/3 « 2~?. We should then picture W> \ Wi 
as contained in a thin shell around the boundary of H;, and the volume of this 
shell is a small proportion of the volume of H,, say less than 1/3 of this volume, 
A(W2 \ Wi) < A(Az)/3 = 9-3? 7/3, and then (18.34) yields 


Bh 4 
sno <n(t) < sno 


Since N2~3? > L*, the smallest integer mp > 2N2~3? /3 satisfies mo > L* /2 and 
4N2-3P /3 < 2mo so that 


VteG, mo < n(t) < 2m. oO 


We will now forget about L* until the very end of the proof. We will prove 
results that hold when mo > L for a large universal constant, a condition which can 
be achieved by taking L* large enough. 
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Let us recall that each of the evenly spread points Y; belongs to a little box Rj of 
volume 1/N. For each t € G, we define K; as the union of the boxes R; for which 
the corresponding point Y; € H,. We define the r.v.s U; by U; = t where t is the 
random point of G determined by X; € K,. Since the boxes R; have a diameter 
< 10N~'/3 and since N~! < 273?/L*, assuming L* > 10°, given any point of 
H, and any point of K,, their difference has coordinates < 2~? +1 In particular, we 
have 


PX, — U;, € 28) = 2-111 . (18.35) 


570 18 The Ultimate Matching Theorem in Dimension 3 


The r.v.s U; are i.i.d. of law jz where jz is the probability measure on G given by 


VreG, u({t}) = ae (18.36) 


so that according to (18.30), 


2mo 


mo 
VreG, —- <ul) < (18.37) 


Thus, jz is nearly uniform on G. To each function w : G — R, we associate the 
function hy, : G > R given by 


hy(t) = inf{w(t') + g*(t — 1’); tT! EG}. (18.38) 
Since y* (0) = 0, we have 
hy <w, (18.39) 


and we define* 
A(w) = fo —hy)du>0. (18.40) 


Since g* > 0 and G is finite, we have A(w) < oo. The crucial ingredient for 
Theorem 18.1.1 is the following discrepancy bound: 


Theorem 18.4.1 Consider an i.i.d. sequence of r-v.s (Uj)i<n distributed like ju. 
Then with probability > 1 — L exp(—100p), the following occurs: 


YVw:GoR, 


(teu) — f ted) < L./mo2??(A(w) +1). (18.41) 


i<N 


The essential difficulty in a statement of this type is to understand which kind of 
information on the function hy, we may obtain from the fact that A(w) is given. 
In very general terms, there is no choice: we must extract information showing 
that such functions “do not vary wildly” so that we may bound the left-hand side 
of (18.41) with overwhelming probability. In still rather general terms, we shall 
prove that control of A(w) implies a kind of local Lipschitz condition on h,,. This 
is the goal of Sect. 18.5. This local Lipschitz condition implies in turn a suitable 
control on the coefficients of a Haar basis expansion of h,,, and this will allow us 
to conclude. The proof does not explicitly use chaining, although it is in a similar 


4 The use of the notation A has nothing to do with the Laplacian and everything to do with the fact 
that A(w) “measure the size of the difference between w and hy”. 
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spirit. The formulation in an abstract setting of the principle behind this proof is a 
possible topic for further research. 

In the remainder of this section, we first prove a matching theorem related 
to the bound (18.41), and we then use Theorem 18.4.1 to complete the proof of 
Theorem 18.1.1. 


Theorem 18.4.2 There exists a constant L, such that the following occurs. Assume 
that 


mo > Ly. (18.42) 


Consider points (Ui)i<n as in (18.41). Then there exists a permutation x of 
{1,..., N} for which 


Ye Ui -— Zr) <N- (18.43) 
ix<N 


Proof First we deduce from Proposition 4.3.2 that 


inf ) 7 9° (Uj — Zxciy) = sup ) | (wi + w}) , (18.44) 
i<N i<N 


where the supremum is over all families (w;);<j and (wi) i<n for which 
Vi,j<N, wet wi’, < ~* (Ui -— Zj) . (18.45) 
Given such families (w;) and (w;), for t € G, let us then define 


h(t) = inf (—wj +*(t —Zj)), (18.46) 
jSN ; , 


so that from (18.45), we obtain w; < h(U;) and thus 


Yow + wh) < OGY) + wi). (18.47) 
i<N i<N 
For t € G, we define 
w(t) := inf{—w' ae ees a (18.48) 


so that, taking in (18.46) the infimum first at a given value t’ of Z;, we obtain 


h(t) = inf{w(t’!) + @*(t — 1’); t’ € G}, (18.49) 
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and consequently, recalling the notation (18.38), 
h(t) =hy(t). 
Also, (18.48) implies 
—w(t) = sup{w' ; Ao eh, 
so that, using (18.36), 


YS {wi ; Zj) =1,j < N}<card{j; Z; =t}sup{w) ; Z; = 1} 


= —Nu({t)w(t) , 
and by summation of these inequalities over tT € G, 
ee -v f wd . (18.50) 
i<N 
Consequently, 
YU) + wi) < AU) - nf ody 
i<N i<N 
= (hui) f nde) —» [w= ia 
i<N 
= (A(Ui) — [rau —NA(w). (18.51) 
i<N 
Now (18.41) implies, since h = hy, 
> (A(U;) — [rau < Li/mo2??(A(w) + 1), (18.52) 


i<N 


and combining with (18.47) and (18.51), we have proved that all families (w;) and 


(w;) as in (18.45) satisfy 


Dw; + wi) < L/mig2*? (Aw) +) — NAW). 
i<N 


Recalling that N > mo22? by (18.31), we obtain that for mp > Ly, the right-hand 


side is < N. 


Oo 
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The idea to prove Theorem 18.1.1 is the obvious one: when we match the points 
U; and Z,,(;) in the discretized version of the problem, we will match the points X; 
and Y,,;) in the original problem, and the next result allows to use the information 
provided by (18.30). 


Lemma 18.4.3 For anyi, j < N, we have 
p(2?-°(X; — Yj)) <1+9¢*(Uj -Z;). (18.53) 


Proof Assume first that U; — Z; € 16So. Combining with (18.28) and (18.35), 
we obtain 2?(X; — Yj) = 2?X; — U; + U; — Zj + Z; — 2?Y; € 19Spo so that in 
particular 2?-6(X;—Yj) € So and since gy = | on So, we get p(2?-°(X; -Y,)=1, 
proving (18.53). 

Assume next that U; — Z; ¢ 16So, and consider the smallest k for which U; — 
Z; € 16S, (ifno such k exists, there is nothing to prove since y*(U;— Zj) = 0), So 
thatk > 1. Since U;—Z; ¢ 16S,_1, we have y*(U;—Zj) => Nx—1 by (18.13). Since 
Sk C 2S,—1 by (18.4), we have U; — Z; € 16S; C 32S,_1. Combining with (18.28) 
and (18.35), we obtain 2?(X; — Y;) € 35S,—1 so that oP-6(x; — Yj) € Sp—1 and 
g(2?-°(X; — Y;)) < Ng-1 by (18.6). We have proved (18.35). El 


Corollary 18.4.4 Consider a permutation x of {1,..., N} such that (18.43) holds. 
Then (18.7) holds. 


Proof This is obvious from (18.53) because by definition of p, we have L*23?+3 > 
N so that 2? > N1/3/L, Oo 


Proof of Theorem 18.1.1. According to Theorem 18.4.1, (18.41) occurs with prob- 
ability > 1 — Lexp(—100p) > 1 — LN~!°. When (18.41) occurs, Theorem 18.4.2 
produces a permutation z for which (18.43) holds, and Corollary 18.4.4 shows 
that (18.7) holds for the same permutation. oO 


18.5 Geometry 


The real work toward the proof of Theorem 18.4.1 starts here. To lighten notation, 
we assume from now on that y = * satisfies (18.13) to (18.16). 

In this section, we carry out the task of finding some “regularity” of the functions 
hy (defined in (18.38)) for which A(w) is not too large. In other words, we describe 
some of the underlying “geometry” of this class of functions. 

We define 


sj(k) = min(p, nj(k)); s(k) = Y> sj(k) : (18.54) 
js3 
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It follows from (18.2) that n;(k) < k, so that 
k< p= sj(k) =nj;(k) (18.55) 
and also using again (18.2), 
kK<p>s(kK=k. (18.56) 


It is the nature of the problem that different scales must be used. They will appear 
in terms of the sets we define now. We consider the collection P; of subsets of G of 
the form 


] [(oj2" +1,..., 7 + 27) (18.57) 
jes 


forb; ¢ N,0 < bj < 2P% (*) _ 1. For lack of a better word, subsets of G which 
are product of three intervals will be called rectangles. There are 27?-°“ rectangles 
of the previous type, which form a partition of G. Each of these rectangles has 
cardinality 2°“): 


CEP, > cardc = 2° | (18.58) 


We say that a subset A of G is Py-measurable if it is a union of rectangles belonging 
to Px. 

Let us say that two rectangles of Px are adjacent if for all j < 3, the 
corresponding values of b; differ by at most 1. Thus, a rectangle is adjacent to 
itself and to at most 26 other rectangles. Given an integer g, let us say that two 
rectangles of P; are g-adjacent if for each j < 3, the corresponding values of b; 
differ by at most q. Thus, at most (2g + 1)°, rectangles of P; are g-adjacent to a 
given rectangle. The elementary proof of the following is better left to the reader. 
We recall the definition (18.3) of the sets S;. 


Lemma 18.5.1 


(a) IfC,C' in Px are adjacent, and ift € C,t' € C’, thent — t' € 2Sx. 
(b) Ift €CeEPy, ACtT+qSx, A € Py, then A and C are q-adjacent. 


Recalling the definition (18.38), given A > 0, we define the class S(A) of 
functions on G by 


StA= [iw > A(w) = ic ~ hy)du < A} (18.59) 


We next state the main result of this section. The essence of this result is that 
for h € S(A) at each point, there is a scale at which it is appropriate to control the 
variations of h. 
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Theorem 18.5.2 Given A > 1, for every functionh € S(A), we can find a partition 
(Bi) k>4 of G such that By is Px-measurable and such that for each C € Cy := {C € 
Pr; C C Br}, we can find a number z(C) such that the following properties hold: 


yea aE A; (18.60) 
k>4 CEC, 
k>4, CECe=> Ne<2C) < Meu. (18.61) 


For every k > 4, if C € Cy and if C' € Px is adjacent to C, then 
TEC, fw ECS |h(t) —h(t’)| < 2(C). (18.62) 


Let us stress that (18.62) holds in particular for C’ = C. 

Any point t of G belongs to some By. Close to tT, the relevant scale to control 
the variations of h is given by the partition P;. Condition (18.61) controls the size 
of the numbers z(C), depending on the local scale at which the kind of Lipschitz 
condition (18.62) holds. The restriction z(C) < Nz+1 is essential; the lower bound 
z(C) > Nx is purely technical. Finally, the global size of the weighted quantities 
z(C) is controlled by (18.60). 

In the remainder of this chapter, we shall use the information provided by 
Theorem 18.5.2 to prove Theorem 18.4.1. 

We start the proof of Theorem 18.5.2, which will occupy us until the end of the 
present section. This proof is fortunately not as formidable as the statement of the 
theorem itself. Considering a function h € S(A), we have to construct the objects 
of Theorem 18.5.2. By definition of S(A), we can find a function w : G > R such 
that 


VteG, A(t) = inf{w(t') + g(t — 1’); t' EG}, (18.63) 
while 


fo —h)du<A. (18.64) 


For each t and t’ in G, we have 
h(t) < w(t) + 9(t-7'), 
so that 


wr) Zhe) -elr—7). 
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Let us then define 
h(t’) = sup{h(t) — g(t — 1’); TEG}, (18.65) 
so that 
h<h<w. (18.66) 


One may think of hasa regularized version of the function w. Moreover, 
[@- h)du < fo —h)du<A. (18.67) 


For C in Px, let us define 


y(C) = minh(t) — max h(z) . (18.68) 
TEC TEC 
Thus, for tT € C, we have y(C) < h(t) — h(t), so that 
L(C)y(C) < [Go —h(t))du(t) . 


Using (18.58) and (18.37), we have u(C) > 2°“mo/N. Using also that mo/N > 
2-3?! by (18.31), we finally obtain 


2° y(C) < L23P i, (a(t) — h(t))d() - (18.69) 


Lemma 18.5.3 There exists a disjoint sequence (Lx)x>5 Of Pe-measurable sets 
with the following properties: 


(a) If A € Py and A C Lx, then y(A) > Nx/2. 
(b) Consider € > 5 and A € Pe with y(A) > Ne/2. Then there exist ' > € and 
A’ € Pe with A C A’ and y(A’) > Ne /2. 


Proof If foreach k > 5 and each A € Px we have y(A) < Nx/2, there is nothing to 
do. Otherwise, consider the largest ko for which there exists A € Py, with y(A) => 
Nx /2. For k > ko, we set Ly = Y. Let Ly. be the union of all such rectangles A € 
Pro with y(A) > Nx. /2. We then construct the sets L; by decreasing induction over 
k. Having constructed Le for £ > k, we define Lz_; as the union of all rectangles 
A € Px-1 for which A ¢ UesxLe and y(A) > Ng_1/2. It is obvious that this 
sequence has the required properties. Oo 


From this point on, we set g = 32. The construction of the partition (By)x>4 
is obtained in the next result, although it will take further work to prove that this 
partition has the right properties. 
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Proposition 18.5.4 There exists a partition (By)k>4 of G consisting of Px- 
measurable sets with the following properties: 


(a) IfC © Px, C C Bx, there exists A € Px, A C Lx which is q-adjacent to C. 

(b) Consider C € Pr, C C Br, k => 5. Consider € > k. Consider the unique 
D € Pe with C C D2. If there exists A € Pe with y(A) => Ne/2 which is 
q-adjacent to D, thenk = £. 


Proof If for each k > 5 and each A € Px we have y(A) < Nx/2, we set Bt = G 
and By = # fork > 4. Otherwise, consider the sequence (Lx)x>5 constructed in the 
previous lemma. We denote by ko the largest integer for which there exists A € Px, 
with y(A) > Nx,/2. We construct the sequence (B;) as follows. For k > ko, we 
set By = Y. We define By. as the union of all the rectangles C € Px, which are 
q-adjacent to a rectangle A € Py,, A C Lig. We then define the sequence (By )x>5 
by decreasing induction over k. Having defined By for € > k, we define Bg_; as the 
union of all the rectangles C € Px—1 which are q-adjacent to an A € Px_1 with 
A C Lx-_, but for which C ¢ Ues,; Be. Finally, By is what is left after we have 
constructed (Bx )x>5. 

The property (a) is obvious by construction. To prove (b), we note that by 
Lemma 18.5.3, there exists ’ > & for which the element A’ € Py with A Cc A’ 
satisfies A’ C Ly. Then the element D’ € Py such that D C D?’ is q-adjacent to 
A’. By construction of By, we have D! C Uprse Ber. Since C C D! and C C Bx, 
and since the sets (Be) form a partition, we have ¢’ < k so that 2 = ¢/ =k. oO 


The reader should keep in mind the following important notation which will be 
used until the end of the chapter. For k > 4, we write 


Cy ={C € Px, C C Bg}. (18.70) 
For C € Cy, k > 5, we set 
x(C) = max{y(C’); C’e Py, C’ C Ly, C’ is g -adjacent to C}, (18.71) 


using that such C’ exists by Proposition 18.5.4, (a). Using Lemma 18.5.3, (a) we 
further have 


x(C) > Ne/2. (18.72) 


Our next goal is to prove the following, which is the main step toward (18.60): 


Lemma 18.5.5 We have 


x > BM +(C) < LBPA, (18.73) 


k=5 CEC, 
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Proof When C é€ C, by definition of x(C), there exists C € Px which is g-adjacent 
to C and such that x(C) = y(C) and C C Lx. Thus, 


= » 2®y(C) = > x BMC) 


k>5 CEeCy k>5 CeCy 


<Q2q¢+1? D> >) 2yc), (18.74) 


k>5 CECy,CCLy 
because there are at most (2g + 1)? sets C’ € Cy for which C’ isa given C € Cx. 


Since the sets L; are disjoint, it follows from (18.69) and (18.67) that the sum on 
the right-hand side is < L237? A. Oo 


For C € Cy, k > 5, we set 
2(C) = min(2x(C), Ne+1) = Ne. 
If C € C4, we set z(C) = Ns. Thus, (18.61) holds. Moreover, 


> 24 2(C) = N52° card Cy < N52° card Py = N523? < L23P A 
CeC4 


since A > 1. Therefore, (18.60) follows from (18.73). 
We turn to the proof of (18.62), the core of Theorem 18.5.2. 


Proposition 18.5.6 [fk > 4, C € Cy, C’ € Px are adjacent, andt € C,t' € C’, 
then 


A(t’) < h(t) + 2(C). (18.75) 
Since z(C) > Nmaxck,5), there is nothing to prove unless 
h(t’) — h(t) => Nmaxtk,5) + 


so we assume that this is the case in the rest of the argument. It follows from (18.63) 
that for some p € G, we have 


h(t) = w(p) + g(t — p). (18.76) 
We fix such a ¢, and we define 
u = max(h(t’) — h(t), v(t — p)) , (18.77) 


so that u > Nmax(k,5). We set 
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According to (18.14), this is a convex set, and (18.15) implies that U = —U. 
Consider the largest integer £ > 1 such that u > Ne. Since u > Nmaxck,5), We have 
£ > max(k, 5), and by the definition of 2, we have u < Ne+ 1. We then use (18.13) 
as well as Se+1 C 2S, to obtain 


8S¢ CU C 328, . (18.78) 
Lemma 18.5.7 There exists A € Pe with 


pre 


ACV:= (18.79) 


t%c(¢+2)n(o+Z). 


Proof Since C and C’ are adjacent, it follows from Lemma 18.5.1 (a) that 
U 
et € 28 CSC TZ. 


Since y(t — p) < u, we have t — p € U, sothat t’ —p =t'-t+1t—-p€5U/4 
and therefore p — t’ € 5U/4 since U is symmetric by (18.15). Consequently, 


pte 4 SU pte. | 3 


; = 18. 
5 3 seers (18.80) 


Thus, defining V as in (18.79), the second inclusion in this relation holds. Even 
though the point (9 +t’) /2 need not be in G, since the set S¢ is twice larger in every 
direction than an element of P2, it is obvious that V entirely contains an element A 
of Pe. | 


Lemma 18.5.8 We have y(A) > u/2. 


Proof Since u > Ns, it follows from (18.16) that g(x) < u/4 for x € 3U/4. 
Consequently, if p’ € GN V by (18.79), we have v(p' — t’) < u/4.and g(p’ — p) < 
u/4. Thus, 


oft ! } i, f Uu 
h(p') = h(v’) — g(t =e ee Pn 
Also, by (18.63), we have 
u 
h(p') < w(p) + go" — p) S w(p) + 7 - 
Thus, using (18.76) in the second line and (18.77) in the last line, 
min fi(p’)— max _h(o’) > h(t’) — w(p) — = 
p'eVNG pleVNG 2 


= h(t!) — h(t) + oe — p) — 5 
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> max(h(c') — h(x), ple — p) — 5 
ee (18.81) 
2 

Since A C V, this implies the required result by the definition of y(A). oO 
Proof of Proposition 18.5.6. We will prove that u < z(C), from which (18.75) 


follows. Since t’ — t € U/4, using (18.79) in the second inclusion and (18.78) 
in the third one, we have 


3U 
Vout crt+Uuct +328, (18.82) 


Since £ > k, and since C € Px, there is a unique rectangle D € Py with C C D, 
and since A C V andt € C, (18.82) and Lemma 18.5.1 (b) imply that D and A 
are 32-adjacent. Moreover, by Lemma 18.5.8, we have y(A) > u/2 > N¢/2. By 
Proposition 18.5.4 (b), we have € = k. Thus, D = C € Px and A € Px. Since 
A and C are 32-adjacent, it then follows from the definition (18.71) that x(C) => 
y(A) = u/2. Also, u < Nea, = Ne41, So that wu < min(2x(C), Ney1) = z(C), 
completing the proof of (18.75). Oo 


Finally, it remains to prove that, with the notation of (18.62), 
h(t) < h(t’) + 2(C). (18.83) 


For this, we repeat the previous argument, exchanging the roles of t and t’, up 
to (18.82), which we replace by 


3U 
Vot+ a Cr+ crt 3282, 


and we finish the proof in exactly the same manner. Oo 


18.6 Probability, I 


To prove a discrepancy bound involving the functions of the class S(A) of 
Theorem 18.5.2, we must understand “how they oscillate”. There are two sources 
for such oscillations. 


¢ The function h oscillates within each rectangle C € Cx. 
¢ The value of / may jump when one changes the rectangle C. 


In the present section, we take care of the first type of oscillation. We show that we 
can reduce the proof of Theorem 18.4.1 to that of Theorem 18.6.9, that is, to the 
case where h is constant on each rectangle C € C;. This is easier than the proof of 
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Theorem 18.6.9 itself, but already brings to light the use of the various conditions 
of Theorem 18.5.2. Throughout the rest of the proof, for t € G, we consider the r.v. 


Y, =card{i< N; U; =t}— Nu({t}), (18.84) 


where Uj are iid. r.v.s on G with P(U; = Tt) = ({t}). We recall the number mo 
of (18.30), and we note right away that 


EY? < Nu({t}) < 2mo, 


so that we may think of |Y;| as being typically of size about ./mo. Before we start 
the real work, let us point out a simple principle. 


Lemma 18.6.1 Consider a sequence (aj)i<m of positive numbers. For a set 
Ic {l,...,M}, define S; = ier aj. For an integer r < M, define Ay = 
max{S7; cardI < r}, and consider I with cardI = r and S; = A,;. Then for 
i ZI, we havea; < A,/r. 


Proof Assume without loss of generality that the sequence (a;) is non-increasing. 
Then J = {1,...,r}, and A; > ra,, while fori ¢ J, we have i > r so that 
aj <a, < A,/r. oO 


The main result of this section is as follows: 
Proposition 18.6.2 With probability > 1 — L exp(—100p), the following happens. 
Consider any A > | and a partition (Bx)x>4 of G such that By is Px-measurable, 


and for each C € Cy := {C € Px; C C Bx}, consider a number z(C) such that the 
following properties hold: 


yea aA; (18.85) 
k>4 CeCy 
k>4,CEC > c(C) < Nes . (18.86) 


Then 


yy tO & el S12 ed: (18.87) 


k>4 CEC, TEC 


The basic problem to bound a sum as in (18.87) is that a few of the rv.s Wo := 
dec |Yr| might be quite large due to random fluctuations. The strategy to take care 
of this is at kindergarten level. Given a subset I of Cy, we write 


Y= c(C)We < max 2(C) )> We + D> 2(C) max We. (18.88) 
CEC, oe Cel CEC; Geta 
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Next, consider an integer r and the quantity 


Ark t= We. 18.89 


In particular, Aj, = maxcec, Wc, and 


>= (C)We < Aik D> eC). (18.90) 


CEC CEC 


If the set J is such that the maximum is attained in (18.89), then by Lemma 18.6.1, 
we have Wc < A,,x/r for C ¢ I, and (18.88) implies 


Ark 

Y> 2(C)We < Are max z(C) + Sate) (18.91) 
CeC; r 

CEC CEC 


We will prove (18.87) by application of these inequalities for a suitable value r = rx. 
The probabilistic part of the proof is to control from above the random quantities 
A,,x. This is done by application of the union bound in the most brutish manner. The 
wonder is that everything fits so well in the computations. 

To carry out this program, we need first to understand the properties of this family 
(Y,) of r.v.s and other related entities. This is the motivation behind the following 
definition: 


Definition 18.6.3 Consider a finite set V. We say that a family (Y,)yey is of type 
B(N) if there exist a probability space (2, 9), iid. r.v.s (Wi)i<y valued in 2, of 
law @ and functions yy, on 92, |Wy| < 1, with disjoint supports for different v, such 
that 


2. 
Foard V < O({W F# O}) < card (18.92) 
and for which 
% => (vot) - / Wd). (18.93) 
i<N 


The family (Y;)reg is of type B(N), with yw, = 1,7; and Q = G provided with 
the uniform probability. The following basic bound simply follows from Bernstein’s 
inequality: 


Lemma 18.6.4 Consider any family (Y,)vev of type B(N) and numbers (ny) vev 
with |ny| < 1. Then for u > 0, we have 


1 u2 card V 
P(| So mY. =u) < 2exp ——min u ————)).. (18.94) 
(IX . ( L ( N Sev ne 
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Proof Consider the r.v.s W; and the functions wy as in Definition 18.6.3. We define 


Si = D2 mvo(W) - 


veV 


Since the functions wy have disjoint supports, we have |$;| < 1 and also 


2 

2 2 2 

E(S; — ES;)" < ES; < aay Me 
Vv 


Since Vycy Ny = Di<n (Si — ES;), and since |5;| < 1, (18.94) follows from 
Bernstein’s inequality (4.44). oO 


We are now ready to write our basic bound. We shall use the following well- 
known elementary fact: For k < n, we have 


(7) < (“)' (18.95) 


Lemma 18.6.5 For eachu > Oandk, r < card Px, we have 


2 


3p 
rsh) (E2-P\r 1. Uu 
P(A, x = U) < 2 (—) exp ( = a mim (u, ao) Fi (18.96) 
Proof By definition of A,,%, when A;,% > u, we can find a subset J C Cx with 
card 7 = r and fort € C € J anumber €, = +1 such that 

y) oS: (18.97) 


tEeCel 


We bound the probability of this event by the union bound. Since cardC = 2° 
for C € Px and card Py = 23?-° < 23?) the factors in front of the exponential 
in (18.96) are a bound for the possible number of choices of J and the €,. To bound 
the probability that given J and the &, (18.97) occurs, we use the bound (18.94) for 
V = G with n, = Oift ¢ I andy, = & if t € I, so that gn? = 12 
and card V = 23?, Finally, we use that by 18.31, we have N < 2mo3? to reach the 
bound (18.96). oO 


The rest of the argument is hand-to-hand combat with the inequality (18.96) to 
choose the appropriate values of r = rz and u while maintaining the right-hand 
side at most L exp(—100p). It is convenient to distinguish the cases of “small k” 
and “large k”. These are the cases 4 < k < kg andk > ko for an integer kg > 4 
which we define now. If 23? < Ne, we define kj = 4. Otherwise, we define ko as 


584 18 The Ultimate Matching Theorem in Dimension 3 
the largest integer with Ni,42 < 23? so that Nig+3 = 23? and thus 
- <2 <Lp. (18.98) 


Recalling (18.27), as N becomes large, so does p, and the condition 2° < Lp then 
forces that ky < p. Recalling the quantity s(k) of (18.54) and that s(k) = k for 
k < p by (18.56), we then have s(k) = k fork < ko. 

We first take care of the values k > kg. This is the easiest case because for these 
values, the r.v.s Wc = )°,<¢ |¥r| are not really larger than their expectations, as 
the following shows: 


Lemma 18.6.6 With probability > 1 — Lexp(—100p) for ko < k < p, we have 
Aig < L2 fig . (18.99) 


Proof We recall that 2° = 2* > 2 > p/L by (18.98). Consider a parameter 
B > 1. Using (18.96) for r = 1 and u = B./mo2°™), we obtain 


B25) 


= 


P(Aix > BYmg2") < 22° 693P exp ( - ) < exp (za - 


If B is a large enough constant, with probability > 1 — Lexp(—100p) for each 
ko < k < p, we have Aix < B2®) no. oO 
We now take care of the small values of k. 


Lemma 18.6.7 With probability > 1 — Lexp(—100p) for 4 < k < ko, we have 
Ark < Lre2*/mo (18.100) 


where ry, = |23?/Ny+2]. 


Proof Since 4 < k < ko by definition of kg, we have Nx+2 < 23? and thus rz > 1. 
Since [x] > 1 => |x] > x/2, this yields ry > 23” /(2Nz42). Consequently, 


23P rk 
(=) < (2eNg42)'* < exp(L2* rg) . 
rk 


Thus, given a parameter B > 1, and choosing u = B, /morx2* in (18.96), we obtain 


P(Anx > BYimpn2*) < exp ((L = > )ri2*) (18.101) 
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Now, since the sequence (2 / N+) decreases, using (18.98) in the last inequality, 


3p+k ko ko 
ry > 2 > 23P 2 > a 
2Nx+42 2Nko+2 2 


P 
>=. 18.102 
2 ‘ ) 


Taking for B a large enough constant, (18.101) and (18.102) imply the result. Oo 


Proof of Proposition 18.6.2. We assume that the events described in Lem- 
mas 18.6.7 and 18.6.6 occur, and we prove (18.87). First, using (18.90), 


> dD HOWe s D7 Are dD) zc), 


k>ko CECy k>ko CEC; 


so that using (18.99), we have 


>> Yo (O)We < LYM YY BC). (18.103) 


k>ko CECy k>ko CECy 
Next for k < ko, by (18.86), we have z(C) < Ni; for C € Cy so that (18.91) 


implies 


Ark 
Yo O)We < Aree t= YO eC). 
CEC, ‘ CeCx 


Using this with r = rz, and using (18.100), we obtain (and since rz, < 23P /Nx+2) 


> 2(C) > lY;| < Lfmagoop Se + L2* /mo > 2(C). (18.104) 


Nut 
CEeCy TEC ae CEC 


Summation of these inequalities for 4 < k < ko and use of (18.85) prove (18.87). 
oO 


Let us denote by Lo the constant in (18.60). 


Definition 18.6.8 Consider A > 1. A function h : G — R belongs to S*(A) if 
we can find a partition (By)x>4 of G where By is Px-measurable with the following 
properties, where we recall the notation Cy, := {C € Px; C C Bx} of (18.70). First 


for each k > 4, his constant on each rectangle C € Cx . (18.105) 


Next, for each C € Cy we can find a number z(C) satisfying the following 
properties: 


yD eS, (18.106) 
k>4 CEC, 
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and 
k>4,CeEeCe SoM <2C) < Negi. (18.107) 


Moreover, if C € Cx and if C’ € Py is adjacent to C and such that for k’ > k, we 
have C’ ¢ By, then 


rEeC, HEC S |h(t) A(t) < 2(C). (18.108) 


Let us stress the difference between (18.108) and (18.62). In (18.108), C’ is adjacent 
to C as in (18.62) but satisfies the further condition that C’ ¢ By for k’ > k. 
Equivalently, C’ C Ue, Be. 

In the rest of this chapter, we shall prove the following: 


Theorem 18.6.9 Consider an i.i.d. sequence of r.v.s (Uj)i<n distributed like jw. 
Then with probability > 1 — Lexp(—100p), the following occurs: For A > 1, 
whenever h € S*(A), 


LD (nw - | tan) < Lymg2?? A. (18.109) 
i<N 


Let us observe right away the following fundamental identity, which is obvious from 
the definition of Y;: 


» (nw) = [ rau) = So h(t) Yr : (18.110) 
i<N tTEG 


Proof of Theorem 18.4.1. Consider a function h € S(A), the sets B,, and the 
numbers z(C) as provided by Theorem 18.5.2. Let us define the function h* on 
G as follows: if C € Cx, then h* is constant on C and f. hduw = f. h* dw. In other 
words, the constant value h*(C) of h* on C is the average value of h on C: 


1 
h*(C) = —— hdu. 18.111 
(Cc) af (18.111) 


Using (18.62) for C’ = C, averaging over t’ € C and using Jensen’s inequality 
yields that |i — h*| < z(C) on C, so that 


| So a@Ye — Yo h*(a)¥e| < 2(C) Do Fel 


TEC TEC teC 


and by summation, 


| do a@y, - Doa*@yY| <5 > (Odo Ire. 


tTEG TEG k>4 CEC, TEC 
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According to Proposition 18.6.2, with probability > 1—L exp(—100p), the last sum 
is < L./mo232” A, so that with the same probability for each function h € S(A), we 
have 


| ~~ A(t)¥;| < | x A*(t)¥,| + Lo/mg2?? A . 


TEG tEG 


Using 18.110 for both h and h*, we then have 


\>o (nui — f nay) <| Dw = f ay)| + Lye” 


i<N i<N 


Therefore, using Theorem 18.6.9, it suffices to prove that h* € S*(A). Using the 
same sets By and the same values z(C) for h* as for h, it suffices to prove (18.108). 
Consider C and C’ as in this condition and t € C,t’ € C’. Consider k’ such 
that t’ € C” € Cy. Then we have k’ < k, for otherwise C’ C C” C By, 
which contradicts the fact that we assume that C’ ¢ By. Thus, C” C C’, and 
consequently, by (18.62) for o € C and p’ € C”, we have |h(p) — h(p’)| < z(C). 
Recalling (18.111), and using Jensen’s inequality, proves that |h*(C) — h*(C’)| < 
z(C) which concludes the proof since |h*(t) — h*(t’)| = |h*(C) — h*(C’)|. oO 


18.7 Haar Basis Expansion 


The strategy to prove Theorem 18.6.9 is very simple. We write an expansion 
h = >>, ay(A)v along the Haar basis, where a,(h) is a number and v is a function 
belonging to the Haar basis. (See the details in (18.117).) We then write 


[do (nw — f hay) < YD lavtyi| D> (= f vau)| 


i<N i<N 


=D Jav(hy|I%ol , (18.112) 
Vv 
where 


h=) 0@) ~ | van) (18.113) 


The first task is to use the information h ¢€ S*(A) to bound the numbers 
|ay(h)|. This is done in Proposition 18.7.1, and the final work of controlling the 
sum in (18.112) is the purpose of the next and last section. 
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We first describe the Haar basis. For 1 < r < p+ 1, we define the class H(r) of 
functions on {1,..., 2”} as follows: 


H(p + 1) consists of the function that is constant equal to | . (18.114) 


For 1 <r < p, H(r) consists of the 2?~” functions fj, forO0 <i < 2?~" that are 
defined as follows: 


1 if i2" <0 <i2" +2"! 
fxOvH=tat. fi 4? ee] Ger (18.115) 


0 otherwise. 


In this manner, we define a total of 2? functions. These functions are orthogonal in 
i? (8) where @ is the uniform probability on {1,...,2?} and thus form a complete 
orthogonal basis of this space. Let us note that | f;,,| € {0, 1}. Also, if f € Hp+1, 
we have { f?d@ = 1, while ifr < p, we have 


| f2,d0 = 2-P +" (18.116) 


Given three functions /f|, f2, f3 on {1,...,2?} denote by ff ® f2 @ fg 
the function on G = {1,...,2?}3 given by fi ® f2 ® f3((o1, 02,03)) = 
fi(o1) f2(o2) fa(o3). For 1 < qi,q2,93 < pt I, let us denote by V(q1, gz, 43) 
the set of functions of the type v = fi @ fo ® f3 where f; ¢ H(qg;) for j < 3. 
The functions v € V(q1, q2, q3) have disjoint supports, and for v € V, we have 
|v|? € {0,1}. As gi, q2, and q3 take all possible values, these functions form a 
complete orthogonal system of L?(v), where v denotes the uniform probability on 
G. Consequently, given any function h on G, we have the expansion 


h= > Y> alhyv, (18.117) 
1<q1,92.93< pt+1 veV(q1,q2.93) 
where 


f hvdv 


dy(h) = dv ri 


(18.118) 


The decomposition (18.117) then implies 


| >> a(zy¥,| < > > Jay(h)|| $> v(t) ¥r] . (18.119) 


tEG 1<q1,q2,93< p+! veV (41,942.93) teG 
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This will be our basic tool to prove Theorem 18.6.9, keeping (18.110) in mind. 
Fixing q1, q2, and q3, the main effort will be to find competent bounds for 


Y> Javthyl| SS o@ye | - (18.120) 


veV (41,9293) teG 


Since we think of q1, g2, and q3 as fixed, we lighten notation by writing 


V := Vi. 92,93); YUEV, Y= > v(t)Yr . (18.121) 
TEG 


Let us set qj = min(q;, p). Since g; < p+ 1, wehaveg; —1< qj < qj, So that 
93P-11-22-93 <cardV = 93P-4-G-G < L23P-41-2-93 (18.122) 


Also, recalling (18.116), we obtain that for v € V(q1, q2, 93) 
) * * e 3 1 =3 
v-dv = 241 +45 +43 Py 291492493 Pp. (18.123) 
= Lb 


Recall Definition 18.6.8 of the class S*(A). The next task is, given a function 
h € S*(A), to gather information about the coefficients a,(). This information 
depends on the information we have about h, that is, the sets By, and the coefficients 
z(C). We think of h as fixed, and for k > 4, we consider the function R; on G 
defined as follows: 


Ry = 0 outside By . (18.124) 
If C € Cy, then Rx is constant = z(C) onC. (18.125) 
These functions have disjoint supports. They will be essential for the rest of this 


chapter. We may think of them as the parameters which govern “the size of h”. 
Since v(C) = 25)~3? for C € Px, 


yo eA a oe (18.126) 
CeC, 
and thus from (18.106), 
> | Rav <LA. (18.127) 
k>4 


Our basic bound is as follows: 
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Proposition 18.7.1 Consider q, 92, q3, and q = qi + q2 + q3. Consider j < 3 
with qj < p. Define k; as the largest k for whichnj;(k) < qj. Then for anyv € V= 
Vin, q2; q3), we have 


|ay(h)| < L23?-49+4/ > 20 PoiRedv (18.128) 
4s<tsk; 


Since |v| € {0, 1}, { |v|Redv is simply the integral of Re on the support of v. A 
crucial fact here is that these supports are disjoint, so that 


> | tized < [ Rev. 


veV 


The reason why only terms for 4 < ¢ < k; appear in (18.128) is closely related to 
the fact that h € S(A) is constant on the elements C of Cy and that a,(1c) = 0 
for C € Px as soon as qj < nj(k) and gq; < p for some j < 3. Observe also that 
Proposition 18.7.1 offers different bounds, one for each value of j with g; < p. We 
will choose properly between these bounds. 

In view of (18.118) and (18.123), to prove Proposition 18.7.1, it suffices to show 
that when g; < p, we have 


|| vnav <2 > 2m Pv) Redv (18.129) 


4<<k; 


The proof relies on a simple principle to which we turn now. We say that a subset of 
N* is a dyadic interval if it is of the type {724 +1, ..., (7 + 1)2%} for some integers 
r,q = O. The support of a function f;,, of (18.115) is a dyadic interval. Given two 
dyadic intervals J and J with card J > card J, we have 


INJABSICIJ. 


Lemma 18.7.2 Consider one of the functions fj, of (18.115), and call its support 
I. Consider a partition Q of I into dyadic intervals. Assume that to each J € Q is 
associated a number z(J). Consider a function g : I — R. Assume that whenever 
J, J' € Qare adjacent and card J > card J’, 


cet, 0 EJ! > |g(0) — (0) < 2(J). (18.130) 
Then 
| > fir(@g(o)| < cardI Y* z(J). (18.131) 
oel JEeQ 


Let us insist that (18.130) is required in particular if J = J’ € Q. 
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Proof We first prove that for all o, 0’ in J, we have 


Igo) — go’) <2 as). (18.132) 
JeEQ 


Without loss of generality, we may assume that o < o’. Let us enumerate Q as 
Ji, J2,...in a way that Je+, is immediately to the right of Je. If for some J € O 
we have o, o’ € J, then (18.130) implies |g(a) — g(o’)| < z(J) and hence (18.132). 
Otherwise, 0 € Je, ando’ € Je, for some £1 < £2. For £; < € < £2, consider a 
point og € Jy. Set og, = o and og, = o’. Then 


|g(o") — g()| = |g(oe,) — glo <= YO Ig(oe+1) — g(o0)1 . (18.133) 


£1 <b<ho 


Moreover, it follows from (18.130) (distinguishing whether card Je+; > card Je or 
the other way around) that 


le(oe+1) — g(oe)| S zUe) + zest) , (18.134) 


and combining with (18.133) proves (18.132). Letting J) = {i2"+1,...,i2"+2'—!} 
and Jy = {i2" + yi eee (i + 1)2"}, we have J = I; U hy. Recalling that 
T= {i2" +1,...,@+1)2’}, we have 


| gorsi, r(o)| = = | > g(a) — » g(o)| = | YS (g) — g(a +2°-)| 


och, oEeh oelt 
and using (18.132), this concludes the proof since card 1, = card 1/2. 
p. By definition of the class V(q1, q2, q3), v is of the type fi © fo ® fy for a 


H(q;). Also, v = vj @ v2 @ v3, where v; is the uniform probability on {1,. 
Therefore, 


oO 
Proof of (18.129). Without loss of generality, we assume that j = 1, so that qi < 
E 
2P}. 


|| vnar| < | [Cf finan) fafrdvnd| < [| f firaviiipflavedvs 


(18.135) 
Let us fix t? and Tr? in {1,...,2?}. We shall prove that 
| i; fi(a)h(o, t7, t*)dvi(o)| 
a am [fi(o)|Re(o, t7, t?)dvi(o). (18.136) 


4<l<k, 
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Therefore, using Fubini’s theorem, (18.136) yields 


[Lf ritavi|i.Atavndiy <2 2m F iaiRedv 


4<t<k; 


and combining with (18.135) yields the result. 

We shall deduce (18.136) from Lemma 18.7.2. Recalling the definition of the 
class V(q1, 92, q3), fi is of the type fj,, given by (18.115) forr = qi < p and 
a certain value of i. Let J be the support of |, and note that i f(o)dvi(c) = 0. 
Consider the map w : {1,...,2?} > G given by w(o) = (0, t7, >), and let 
I* = wW(J). Assume first 


Ho, m@)>r=q,4ICeCe, CNY. (18.137) 


Since C € Pe, J = w!(C) is a dyadic interval with card J = 2 > 20 = 
card J, and since 1M J 4 O because CN I* 4 Y by (18.137), we have J c J. Now 
h is constant on C, so that the function 0 +> h(o, T1, T2) is constant on J. Since 
iG fidv; = 0 and J is the support of /;, the left-hand side of (18.136) is zero in this 
case. So we may assume that (18.137) fails, i-e., 


ve, WEG, CN AOS <q. (18.138) 


We consider the partition Q of J that consists of the sets of the type y~!(C) NJ, 
where, for some € > 1, C € Cg and CN I* 4 Y. When J = w!(C) € Q, we 
define 


z(J) := 2(C). 


We now prove that (18.130) follows from (18.108). Consider J, J’ € Q which 
are adjacent with card J > card J’. Then J = w7!(C) and J’ = w7!(C’) where 
C and C’ are adjacent and C € C,, C’ € Cy. We claim that we may assume that 
€> l’. Since card J = 2" and card J’ = 2"! this is automatically the case if 
card J > card J’. If card J = card J’, it suffices if necessary to exchange the names 
of J and J’. Thus, C’ € Cy for some £’ < & and in particular C’ C By so that 
C’ ¢ By fork > @, and then by (18.108), we have |h(t) — h(t’)| < z(C) = z(J) 
fort € C andt’ € C’, and this proves (18.130). 

We define 


S=Van=yd {aor car49,ceUc}. 18.139) 


JEeQ é>1 
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Using Lemma 18.7.2 in the inequality, we obtain 


|| |fi(o)h(o, t!, t7)| =27?| 9° fia, 7, r°)| 


oel 


< (27? card 1) S* = 27? +41 §* | 


Recalling the quantity S of (18.136), we now prove that 27! $* = 2?S, finishing the 
proof of (18.136) and of the lemma. For this, we observe that if C € C¢ is such that 
CNI* # G, then J = w7!(C) is a dyadic interval with J M I 4 %. Moreover, 
since (18.138) implies card J = 21 < card = 2%, we have J C J, so that 
card(C N I*) = 21, Consequently, 


AES 2-9 > file) Rie, 27,27) (18.140) 


oeJ 


because there are 2”! non-zero terms in the summation, and for each of these 
terms, | fj(o)| = 1 and Re(o, t7, t7) = z(C). We rewrite (18.140) as 


z(C) =2P 1 i lf (o)|Re(o, t*, T°) . 
J 


Summation of the relations (18.140) over C € Jess Ce with CNI* 4 G then proves 
that 2? S = 27'S”. Oo 


18.8 Probability, II 
We go back to the problem of bounding the quantities (18.120) 


Y>  lav)II¥ol (18.141) 


veV (41,492,943) 


where the r.v.s Y, are defined in (18.113). We think of q1, g2, and q3 as fixed, and 
we write again g = qi + q2 + q3 and V = V(q1, q2, q3). We will then bound with 
high probability the sum (18.141) by a quantity C(q1, q2, g3) in such a way that the 
sum over qj, g2, and q3 of these quantities is < L fmo23? A. The plan is to combine 
the bound of Proposition 18.7.1 with probabilistic estimates. Computation of EY, 2 
shows that |Y,| is typically of size about ./mo24 /2 but some of the quantities |Y,| 
might be much larger than their typical values. 
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We will use the same kindergarten technique as in Sect. 18.6, namely, if b, are 
positive numbers and if 


Ar = max YoYo. (18.142) 
vel 
then 
yo bl¥ol < Ar D> be (18.143) 
veV veV 
and 
Ay 
So bylYol S — S0 by + Ay max by « (18.144) 
r veV 
veV veV 


The random quantity A; depends also on qj, q2, and q3, but this is kept implicit. 


Lemma 18.8.1 For each u > 0, andr < 23”, we have 


P(A, > u) < (ZY exp ( = = min (u, x) (18.145) 


Proof Since the functions v € V have disjoint supports, it should be obvious from 
the definition (18.113) that the family of r.v.s (Y,)yey belongs to B(N). When 


A; > u, we can find a set J with cardJ = r and numbers ny = +1, v € I 
with ue 1%vYy = u. The result then follows from (18.94) and the union bound 
since card V < 237, N < 23?t!mo and ,-, 2 =r. oO 


Corollary 18.8.2 With probability > 1 — Lexp(—100p), the following occurs. 
Consider any 1 < q1,92,93 < p+landq = q1 +942 + q43. Then for each 
k > 4, we have 


2? < Neo, k<q => Al < Lo/mo2*"/?24/? (18.146) 
and 


k <q <p, Neqo < 2°? ,r i= [23 /Neso] => Ap < Lr fmp2"/?29/? , 
(18.147) 


Proof To prove (18.146), we observe that since 23? < N42, we have 2 > p/L. 
Considering a parameter C > 1, we take u = C./mo2*/224/2 > 2* in (18.145) so 
that min(u, u*/(mo22)) > C2* and the result by taking C a large enough constant. 
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To prove (18.147), we note if x > 1 we have |x| > 1 sothatx < [x]+1 <2|x]. 
Therefore, since Nz42 < 23P, we have r = (23? /Nz+2 I > 1 and 23P /Nu+2 < 
2r so that 23? /r < 2Nx+42. We first prove that r2k > p/L. If Nez2 < 2?, then 
23? (Nga > 27? and r2* > r > 2?P > p/L. If Ny+2 > 2?, this holds because then 
2 > p/L. 

Consider then a parameter B > 1. For u = Br.fmg2*/?24/?, since 29/2+k/2 > 
2* we have min(u, u? /24mor) > B2*r, and since 23P /r < exp L2*, the bound 
in (18.145) is < exp(r2*(L — B/L)). Since r2* > p/L, when B is a large enough 
constant, we obtain exp(r2* (L — B/L)) < Lexp(—100p). oO 


Corollary 18.8.2 contains all the probabilistic estimates we need. From that point 
on, we assume that (18.146) and (18.147) hold, and we draw consequences. Thus, 
all these consequences will hold with probability > 1 — Lexp(—100p). We first 
reformulate (18.143) and (18.144). 


Proposition 18.8.3 Consider numbers (by) yey, by = 0. Then for k < q, we have 


2°? < Nevo => ) | byl¥ul < La/mg2"?24* > by (18.148) 
veV veV 
and 
Ne+2 < 23P => pa [Yy| < Lyimg2*!?29!2( S*b + 2 max b ) . 
+2 = vi4vl = v Nex yep v 
veV veV 
(18.149) 


It is good to observe that (18.149) has the same form as (18.148) but with an extra 
term. Controlling this extra term requires controlling max,<y by. One problem in 
using Proposition 18.8.3 is to decide which value of & to use. If the value is too 
large, the term Qk le may become too large, but if the value of k is too small, then it 
is the extra term which creates issues. 


Proof (18.148) is an immediate consequence of (18.143) and (18.146). To 
prove (18.149), we use (18.144) for the value r = (23? /N+2]. In the second term 
of (18.144), we use the bound 


2? 
Ay 2 Lr Jmol? 20? <2 EL —_./mg2"/?24/? 
Nx+2 
In the first term of (18.144), we use the bound A,/r < L, Jp 2*/229/2 | fl 


We will typically use Proposition 18.8.3 with b, = |a,(h)|, which is why we 
systematically try to estimate ey |ady(h)| and maxyey |ady(h)|. 
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18.9 Final Effort 


We turn to the task of bounding the quantities 


S(qi.92,43) = D> lav(A)IIYoI 
veV (41,492,493) 


of (18.141). For each value of gj, q2, and q3, our goal is to provide a suitable bound 
on S(q1, 42, q3) to imply that 


Yd. S  Sqi. ga. 43) < Lmg2*? A. (18.150) 


q qitqe+q3=4q 


The proof relies on Proposition (18.8.3). The proof is not easy for a reason which 
is quite intrinsic. There are lower-order terms (many of them) for which there is 
plenty of room, but there are also “dangerous terms” for which there is no room 
whatsoever and for which every estimate has to be tight. This is to be expected 
when one proves an essentially optimal result. 

Let us start by a simple result showing that the large values of g do not create 
problems. 


Lemma 18.9.1 We have 


S(qi, 92,93) < Lp23?-4° Jig . (18.151) 


This bound is not useful for g small because of the extra factor p. But it is very 
useful for large values of g because of the factor 2~4/°. 


Proof First, if gq. = g2 = q3 = p + 1, then the unique element v of V(q1, 2, 93) 
is the constant function equal to 1 and Y, = 0 eg v(t) Yr = Doreg Vr = 0 (as 
is obvious from the definition (18.84) of Y;). In all the other cases, one of the q; 
is < p so thatg := qi +q2+ 43 < 3(p+ 1). We then choose j < 3 such that 
qj < 4/3 < p+1,so that we can use the bound (18.128). We use the trivial bound 
nj;(€) = 0, and we get? 


Java] < LPS SY Tt |v|Redv . 
4<e<k; 


5 This was our first instance of choosing between the various bounds proposed by Proposi- 
tion 18.7.1. There will be several others. 
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Thus, since the functions v € V(q1, g2, g3) have disjoint supports and satisfy |v| < 
1, and using (18.127), 


Slater ra. (18.152) 
veV (41,492.93) 


Consider the smallest k for which 23? < Nx+2 so that Nrz1 < 23P and then 2* < 
Lp. Using (18.148) for by = |ay(h)|, we obtain 


YS av(hy|Yol < Lafmmg2*/?29/7 Say (hy I, 
veV (41.92.93) veV(q1.92,93) 
and combining with (18.152) and using that OO p yields (18.151). oO 


To illustrate the use of (18.151), we note that there are at most gq possible choices 
of g1, q2, and q3 for which g; + g2 + q3 = q, so that 


SY > S(qi. gz. 43) < Lpq?2??-46 mp . (18.153) 
M1792+93=4 


Summing these inequalities over g > p and using that p ae g°2-4/© < L shows 
that in the sum (18.151), we need only be concerned with the values of g < p. 
From now on, we always assume that g < p. We classify the elements v € V by the 
integer 


k(v) = max {k / [v[Redv 4 o| (18.154) 


the largest value of k for which the support of v meets By (so that k(v) > 4). Given 
41, q2, and q3 and an integer k, we define 


S(q1, 92, 93,k) = lay(h)||Yo| , (18.155) 
veV,k(v)=k 


Lemma 18.9.2 Set q* = |q/4]. Then for k’ < q*, we have 


274 
S(q1, 42,43.) < Lafmng2'? (2-44 + x—) (18.156) 
q*+1 
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Proof In one sentence, this bound follows from Proposition 18.8.3 by taking k 


IA Il 


q*. We choose j such that g; < ¢/3, so that using again the trivial bound 2~"/ © 
1, the bound (18.128) implies that for v € VY, 
lay (h)| < L23?-74/ > |v|Redv . (18.157) 


4<l<kj 


We recall that vey i |u| Redv < rf Redv because the functions v € V have disjoint 
support and satisfy |v| < 1. Using this in the first inequality and (18.127) in the 
second inequality, 


SwB@len? 4 > | Reaw 2p 4 A. (18.158) 
veV l>4 


It is the factor 2~74/3 which saves the day. We are going to apply Proposition 18.8.3 
with k = q*. The factor 2*/2+4/? on the right of (18.148) and (18.149) is then 
29°/2+4/2 < 254/8, and 5q/8 — 2q/3 = —q/24. 

When k(v) < q* by definition of k(v), we have { |v|Rgdv = 0 for £ > g*, so 
that (18.157) implies 


k(v) < g* > lay(h)| = £2972-79@3 * / |v|Redv , 
4<t<q* 


Since f |vjdv = 29-3P and Re < Nex, and f \vidv = 24-3? the supports of the 
functions R¢ being disjoint, we obtain |a,(h)| < L27Ng*+1. 


Let us now consider k’ < g* and set by = |a,(h)| if k(v) = k’ and b, = 0 
otherwise. Using (18.158) in the first inequality, we have proved that 


Yo by < L23P-*473.A ; max by < L24Ngr41 « (18.159) 
oe veV 


If 23? < Ng*+2, we use (18.148) for k = q* and (18.159) to obtain 
S(Q1, 92193, k’) < Lemp 29 1229/223P-2423 A 


which implies (18.156) since q* < q/4 (and hence q*/2 + q/2 — 2q/3 < —q/24). 
If Ngx42 < 23?, we then use (18.149) and (18.159) to obtain 


* 3p 
S(q1, 92s 93, ’) < Lafmo2! is ama +o 24Nq+41) (18.160) 
qe +2 
Using that Ng+42 = Now and since g* < q/4, we obtain (18.156) again. oO 


The bound (18.156) sums well over q since Ng» is so large. 
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Matters become more complicated in the case k(v) > q*, because we no longer 
have a strong bound such as (18.159) for the values of |a,(h)|. So we have to use a 
larger value of k to be able to control the last term in (18.149). But then the factor 
2*/2+4/? becomes large, and we have to be more sophisticated; we can no longer 
afford to use the crude bound 27"/ < 1 in (18.128). In fact, in that case, there are 
“dangerous terms” for which there is little room. How to use the factor 2~”/ is 
described in Lemma 18.9.4. 


Let us set® 
Bx = f Rav, (18.161) 
so that (18.127) implies 
Dobe SLA, (18.162) 
k>4 
and we also have 
ae |u| Redv < By. (18.163) 
veV 
Let us also define 
D(k, q1, 92. 93) = | | Vnj<a;) - (18.164) 
js3 


Proposition 18.9.3 For k > q* = |q/4], we have 


S(qi, 92,93.) < LD(k, a1, 92, g3) J m927°?2* 9/6 (By + Be—1) 


224 94/2+k/2 
4 Lymg2*? (A ——) (18.165) 
Ne-1 Nx+1 


The crucial term is the factor 2“—/°, which will sum nicely over g > k. There 
is plenty of room to estimate the second-order quantities represented by the second 
term. 

The proof of Proposition 18.9.3 requires two lemmas. 


Lemma 18.9.4 [fk(v) = k => q*, we have 


lav(h)| < by(h) + cy(A) , (18.166) 


® 6, depends on h although this is not indicated in the notation. 
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where 
Cy(h) = L23P >. frvirer, (18.167) 
4<l<k—2 
and 
by) = LDG, Gi. qi, 93)2° 2 “A / |ul(Re + Re-1)dv. (18.168) 


As we will show in the next lemma, the terms c,() are a secondary nuisance, but 
the terms b,(h) will be harder to control, because to control them all the estimates 
have to be tight. The purpose of the decomposition is to identify this “dangerous 
part’b,(h) of |ay(h)| and, when using the bound (18.129), to choose the value of j 
with the goal of controlling this dangerous part as well as possible, in particular by 
creating the crucial factor 2~24/3-k/3 in (18.168). 


Proof We recall the bound (18.128): If k; is the largest value of k for which n;(k) < 
qj, then for any j < 3, 


jav(hyl < LBPe YY 2 Flea 
4<l<k; 
We split the summation to obtain |ay(h)| < by, ;(A) + cy(h) where 
by, j(h) = L23?~4+4i ae |v|Redv (18.169) 
le{k,k— X 4<¢<k; 


and cy(hA) is given by (18.167), using also there the crude bound q; — nj(€) < q. 
Let us now choose j. If for some j < 3 we have k > kj; + 2, we choose such a j. 
Then the term b,, ;(/) is zero because there is no term in the summation and we are 
done. Otherwise, k < k; + 1 for all j < 3, and since n;(k — 1) = nj(k) — 1, we 
have 


by, j(h) < L23P-9+4-1/ / Re Reais 


Since k < kj + 1 foreach j < 3, we haven;(k) <nj(kj +1) <nj(kj)+1< 4q;, 
and thus D(k, q1, g2, g3) = 1. We choose j < 3 such that 


1 1 
qj —nj{kK) S 5a nth) =s@-sh)=se-*), 


j's3 


so that by, ;() < by(h) where b,(h) is given by (18.168). oO 
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We lighten notation by writing Se Mes as tie 
Lemma 18.9.5 We have 


24 
y Ghee A —, (18.170) 
ke Nx-1 

v)=k 


Here we see how useful it is to have controlled separately the small values of k and 
to assume k > q*, which ensures that 27/Nx_1 is very small. 


Proof We bound the sum by the number of terms times the maximum of each 
term. Since s(k) = k, combining (18.107) and (18.106) shows that cardC, < 
LA23P-S® I Ny. Each C € Cx has cardinality < 2°) and can meet at most 25) 
supports of different functions v so that 


23P 
card{ue V; k(v) =k} < LAs : (18.171) 
k 
Since Re < Ne+1, we have, using that f ju|dv = 24-3? forv EV, 


- [ivinear < LN | luldv < L29-3?Ny_1 , (18,172) 
4<€<k—-2 


so that |cy(h)| < L24Nx_1. Combining with (18.171) proves the result since Nz = 
Net o 


Proof of Proposition 18.9.3. From Lemma 18.9.4 and (18.170), (18.163), we 
obtain 


24 
De lav (h)| S LDU, a1, gr, q3)2°P2 74 (Bye + Bei) + L2°P A — 
k(v)=k k-1 
(18.173) 


To bound max;(y)=x |dy(h)|, we go back to (18.128). Since resi ju|Redv < 
L29-3P Nx+1, we obtain the bound 


lav(h)| < L249 Ney1 . (18.174) 


The rest of the argument is nearly identical to the end of the proof of Lemma 18.9.2 
using now (18.173) and (18.174) instead of (18.159) and (18.160). Let by = |ay(A)| 
for k(v) = k and b, = 0 otherwise. We use the bound (18.148) if 23? < Ng42. 
Otherwise, we use the bound (18.149). This concludes the proof, using also that 
Ne+2 = Ne a oO 


Combining Lemmas 18.9.2 and Proposition 18.9.3, we obtain the following: 
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Proposition 18.9.6 Recalling that q* = |q/4| have 


S(q1, 92,93) < Lig?” (2-4/4 + \> (AK, a1. 42,93.) + Blk, 9))) 


k>q* 
(18.175) 
where 
A(k, q1, 2, 93, h) = Dik, qi, Q2, 93) 28! (Be + Be-1) 5 (18.176) 
(k, q) canary al (18.177) 
Bik, = A— + ——__.. A 
2 Nx-1 Nx+1 


Proof of Theorem 18.6.9. We have to prove that (18.109) holds. Combin- 
ing (18.110) and (18.119), we have to prove that 


2 SY S(qi, 42.93) < Lmg2?P A . (18.178) 


3<q<3pt+3 qi+92+93=4 


Lemma 18.9.1 takes care of the summation over g > p. Control of the summation 
for g < p will be obtained by summing the inequalities (18.175) and interchanging 
the summation in k and g. Given q, there are at most gq? possible values of 
(41, 92, 93) with g = q1 + 42 + 43, and 


ys gaa ZF 
qzl 


Also, 


> YB gq) <LA+ 4) <LA, 


qzlk>=q* 


because g* = |q/4] and N; is doubly exponential in k. It remains only to take care 
of the contribution of the term A(k, q1, g2, g3, 1). We will prove that 


Yo YAK, a1. 92. 93,8) SLA, (18.179) 
91,92,93 k>4 


and this will finish the proof. The first step is to exchange the order of summation 


Yd YAK, a1, 42,93,2) = Be + Bet) D5 Dek, gi, 2. g3)24-"* 
91,92,93 k=4 k>4 1159243 
(18.180) 
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Each term D(k, q1, 92, 43) is O or 1. When D¢k, q1, 92,93) = 1, we have nj(k) < 
qj for each j < 3. Since Vis3 nj(k) = k, the non-negative integers q; — nj;(k) 
have a sum < g — k. This can happen only for g > k, and then crudely there are at 
most (¢ — k + 1)? possible choices of g1, 2, and q3 of a given sum q. That is, 


Y> Dik qi. qa, 93)2° 0" < Vig —k+ 1289" <L. (18.181) 
192,93 qzk 


The required inequality (18.179) then follows from (18.180), (18.181), and (18.162). 
oO 


Chapter 19 ®) 
Applications to Banach Space Theory od 


We concentrate on topics which make direct use of our previous results. Many 
more results in Banach space theory use probabilistic constructions, for which the 
methods of the book are relevant. Some of these results may be found in [132]. 
The reader should not miss the magnificent recent results of Gilles Pisier proved in 
Sect. 19.4. 

As is customary, we use the same notation for the norm on a Banach space X 
and on its dual X*. The norm on the dual is given by ||x*|| = sup{x*(x); ||x|| < 1} 
so that in particular |x*(x)| < ||x*||||x||. The reader will keep in mind the duality 
formula 


I|x || = sup{x*(x) ; |lx* || < 1} = sup{x*(x) ; x* € XT}, (19.1) 


which is of constant use. 
Iam particularly grateful to Rafat Meller for his help with this chapter. 


19.1 Cotype of Operators 


The notion of cotype of a Banach space reflects a basic geometric property of this 
space, but we will study only very limited aspects related to our previous results. 


19.1.1 Basic Definitions 


We start by recalling some basic definitions. More background can be found in 
classical books such as [27] or [137]. 
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Given an operator U (i.e., a continuous linear map) from a Banach space X to 
a Banach space Y and a number q > 2, we denote by Ce (U) its Gaussian cotype- 
q constant, that is, the smallest number A (possibly infinite) for which, given any 


integer n, and any elements x1, ..., x, of X, we have 
a 1/q 
(Siveait) ” < AE] Dosa. (19.2) 
i<n i<n 


Here, (g;)i<n are ii.d. standard Gaussian r.v.s, the norm of U(x;) is in Y and the 
norm of )°;_,, gixi isin X. 
The occurrence of the quantity 


El dai =E 


i<n 


sup )> gix*(xi) , (19.3) 
X*EXT Fen 
where X} = {x* € X*; ||x*|| < 1} suggests that results on Gaussian processes 
will bear on this notion. This is only true to a small extent. It is not really the 
understanding of the size of the quantity (19.3) at given x1, x2, ..., X, which matters 
but the fact that (19.2) has to hold for any elements x), x2,..., Xn. 

Given a number q > 2, we define the Rademacher cotype-q constant C gee. ) 
as the smallest number A (possibly infinite) such that, given any integer n, any 
elements (x;)j<n of X, we have 


(Swear) < ab] Dex 


i<n i<n 


’ (19.4) 


where (€;)j<n are i.i.d. Bernoulli r.v.s. The name “Rademacher cotype” stems from 
the fact that Bernoulli r.v.s are usually (but inappropriately) called Rademacher 
r.v.s in Banach space theory. Since Bernoulli processes are tricker than Gaussian 
processes, we expect that Rademacher cotype will be harder to understand than 
Gaussian cotype. This certainly seems to be the case. 


Proposition 19.1.1 We have 


aa 
CHU)s [5GqU) ; (19.5) 
Proof Indeed (7.40) implies E|| }0;<, €:xill < /7/2Ell oj <p 8ixill - Oo 
Given q = 1, we define the (g, 1)-summing norm ||U||z,1 of U as the smallest 
number A (possibly infinite) such that, for any integer n, any vectors x1,...,X, of 
X we have 


‘Oz junit) < ae | So eixi | : (19.6) 


i<n i<n 
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It should then be obvious that ||U|lg,1 < C, a (U). Consequently, 


2 
[2 maxicfw, IWllg1) $ CG). (19.7) 


Research Problem 19.1.2 Is it true that for some universal constant L and every 
operator U between Banach spaces we have 


CU) < Lmax(Cj(U), || lq.1) ? (19.8) 
A natural approach to this question would be a positive answer to the following 


far-reaching generalization of the Latata-Bednorz theorem: 


Research Problem 19.1.3 (S. Kwapien) Does there exist a universal constant L 
with the following property: Given any Banach space X and elements x), ..., x, of 
X, we can write x; = x; + xj’ where 


El >¢ aix; || < LE| >> el ; max | Sexi] < LE] So esx: . 


i<n i<n i<n i<n 


Exercise 19.1.4 Prove that a positive answer to Problem 19.1.3 provides a positive 
answer to Problem 19.1.2. Hint: Study the proof of Theorem 19.1.5 below. 


19.1.2 Operators from £¥ 


We now specialize to the case where X is the space € of sequences x = (xj) j<N 
provided with the norm 


|||] = sup |x; , 
i<N 
and we give a positive answer to Problem 19.1.2.! 


Theorem 19.1.5 Given gq = 2 and an operator U from tY to a Banach space Y, 
we have 


2 
[2 maxicfw, lg) < CG(U) < Lmax(Cj(U), ||lUllq,1) - (19.9) 


'Tt is possible to show that similar results hold in the case where X = C(W), the space of 
continuous functions over a compact topological space W. This is deduced from the case X = (97 
using a reduction technique unrelated to the methods of this book; see [60]. 
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The reason we succeed is that in the case X = ¢47 we can give a positive answer to 
Problem 19.1.3, as a simple consequence of the Latata-Bednorz Theorem.” 


Proposition 19.1.6 Consider n elements x1,...,Xn of ey. Then we can find a 
decomposition x; = x; + xj such that 


E| 0 six{| < LE] )oeixi| (19.10) 
isn i<n 
and 
max || )) <ix/| < LE| )oeixi| - (19.11) 
q=tl ee = 


Proof of Theorem 19.1.5. We shall prove the right-hand side inequality of (19.9): 
Cy) < L(CGU) + IU lq.) - (19.12) 


Let us consider a decomposition x; = x; + x/’ as in Proposition 19.1.6. Then 


1/ 
(STivepit) " s eefOE| Y sill < LCHOE| Draixil 19.13) 


i<n i<n i<n 


1/ 
(Sveit) "seta max || Dix S L1IUIlq.s El] Dexi| 


i<n i<n i<n 


(19.14) 


Since ||U(x)|| < |U@)|| + ||U Gx!) |, the triangle inequality in ¢7 implies 


(Swett) < (Swed) "+ (wen), 


i<n i<n i<n 


and combining with (19.13) and (19.14), this proves (19.12). oO 


Exercise 19.1.7 Before you read the next proof, make sure that you understand the 
following reformulation of the Latata-Bednorz theorem: Given any subset T of ¢”, 
we can write T C 7; + To with E super, 1 giti < Lb(T) and SUP; eT, Itll1 < 
LD(T). 7 


Proof of Proposition 19.1.6. Let us write xj = (xij)1<j<n. Forl1 < j < N, 
consider t; € IR” given by tj = (xij)i<n. Let % = O € R"” and consider 


? Weaker results suffice, and the author proved Theorem 19.1.5 long before the Bernoulli 
conjecture was solved. 
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T = {to, t1,..., tn} so that 


b(T) = Emax (0, sup siti) <El| vezi. (19.15) 
I<jsN i<n i<n 

Theorem 6.2.8 (The Latata-Bednorz theorem) in the formulation of Exer- 

cise 19.1.7) proves for 0 < j < N a decomposition ¢; = 1; + 1'/, where 


t’ = = (Xj)icn.t; jo age pisn, and 


sup > ax}, < LD(T) (19.16) 
* open i<n 
Vi<N,) [xf] < Lor). (19.17) 
i<n 


Since t9 = 0 = 19 + 9, for each O < j < N, we can replace t), by t; — tg and t”’ by 
t’ —tg, So that we may assume that fj = tg = 0. Fori <n, we considel the Hemenis 
x a (Xi) j<N and x = i) j<Nn of we, Thus x; = x; + x;’. Obviously (19.17) 
implies (19.11). 

Let us now prove (19.10). When a process (X;);c7’ is Symmetric and X; = 0 for 
some s € T’, then Lemma 2.2.1 implies 


Esup|X;| < E sup |Xs — X;| = 2E sup X; . 


teT s,teT’ teT’ 


Using this for X; = ye gix; when t = (xj)j<, and T’ = {t, bakes ty} yields 
(using that xX; = 0 since t) = 0), using also (19.16) in the last inequality, 


E| >— six; LE ae P| 28 %ij| SEs bi PD 8x1) SLT). oO 


i<n i<n N i<n 


19.1.3 Computing the Cotype-2 Constant with Few Vectors 


The results of the present section are included not because they are very important 
but because the author cannot help feeling that they are part of an unfinished story 
and keeps hoping that someone will finish this story. The main result of the section 
is arguably a new comparison theorem between Gaussian and Rademacher averages 
(Theorem 19.1.11 below) which makes full use of Theorem 6.6.1. 

When U is an operator between two finite dimensional Banach spaces X and Y, 
we recall the definition (19.4) of the Rademacher cotype-2 constant C3 (U) of U. 
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Definition 19.1.8 Let us associate to each Banach space X an integer M(X). We 
say that M(X) vectors suffice to compute the Rademacher cotype-2 constant of an 
operator from X to any Banach space Y if for any such operator U one can find 
vectors x], ..., Xx) in X with 


( x Iwan)” > -CE| yells (19.18) 


i<M(X) i<M(X) 


N. Tomczak-Jaegermann proved [137] that “N vectors suffice to compute the Gaus- 
sian cotype-2 constant of an operator from any Banach space X of dimension N”. 
This motivated the previous definition.’ Our main result is that N log N log log N 
vectors suffice to compute the Rademacher cotype-2 constant of an operator U from 
a Banach space X of dimension N. It does not appear to be known if N vectors 
suffice. 

Consider a Banach space X of dimension N > 3, and its dual X*. Consider 
elements x1,...,xX, in X and assume without loss of generality that they span X 
(we will typically have n >> N). We will now perform some constructions, and the 
reader should keep in mind that they depend on this sequence (x;)i<n. We identify 
X* with a subspace of e by the map x* b> (x*(4;))j<n, so that 


1/2 
eth = (oan) (19.19) 
i<n 
This norm arises from the dot product given by 
er y= > ey Gi) 
i<n 
Consider an orthonormal basis (ej)i< n of X* for this dot product. Then 
Cea Cee tir ba > oie: 
J<N JSN 


so that the elements x* of X* with ||x||2 < 1 are exactly the elements )~ j<N Bye; 


with Vi<n B; < 1. The dual norm || - ||2 on X is then given by 


IIx|l2 = sup{lx*(x)| 3 [x"ll2 <= B 


= sup {| Y> Bes (x)| ; > BF < 7 = ‘o> ea’) . (19.20) 


SN JSN J<N 


3 Similar questions in various settings are also investigated, for example, in [43]. 
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Using (19.19) with x* = e%, we obtain 1 = lle* 7 are e* (xi) so that, 


using (19.20) to compute ||x; 5. we get 


Yikes kee => ea = (19.21) 


i<n i<n j<N J<N isn 


Considering independent standard normal r.v.s (7) j;<v, G = © jen nje; isa 


standard Gaussian random vector valued in (X™, || - ||2). For a subset T of X, we 
define 
g(T) = E sup G(x) = Esup > nje¥(x). (19.22) 
xeT xE i<N 


The reason for the notation is that g(7) is the usual quantity when we consider 
T as a subset of the Hilbert space (X, || - ||2), simply because by (19.20), the map 
Xb (eF(X)) j<n is an isometry from (X, || - ||2) to C For further use, we spell 


now a consequence of (19.22) and of Sudakov’s dual minoration (Lemma 1339)7 
Lemma 19.1.9 Consider a subset T of X with T = —T and the semi-norm || - ||r 


on X* given by ||x*\|r = sup,e7 x*(x). Then the unit ball of (X*, || - ||2) can be 
covered by Ny, balls for the norm || - ||7 of radius Le(Ty2- 2. 


Proof It follows from (19.22) that g(T) = E||G||7. The conclusion then follows 
from Lemma 15.2.7. Oo 


We will not use anymore the formula (19.22) but only the general fact that in any 
Hilbert space, if T = {+t ; k > 1}, then 


g(T) < Lsup (lla llVlog(k + D), (19.23) 
k>1 


as shown in Proposition 2.11.6. Let us stress again that the quantity g(7) depends 
on T and on the whole sequence (x;)i<n- 


Lemma 19.1.10 [fT = {4x1,...,+%n}, then 


e(T) < Llog(N +1). (19.24) 


When the sequence (\|xi|\|2)i<n is non-increasing, and if M = |N log N], the set 
T’ = {+x;; M <i <n} satisfies 


eT) <L. (19.25) 


4 Please keep in mind, however, that this embedding of T in a Hilbert space depends on x),..., ben 
and so does the quantity g(T). 


5 Please note that the original norm of T plays no part in this result. 
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Proof Both results are based on (19.23). Since the norm || - ||z2 on X is the dual 
norm of the norm (19.19), it is obvious that ||x;||2 < 1. Assuming that the sequence 
(\|x;|l2)i>1 is non-increasing, we see from (19.21) that ||x;|l2 < ./N/i and thus 
|x; || < min(1, /N/7). Using (19.23) for the sequences t, = xz, (1 < k <n), we 
obtain 


2(T) < L sup (min (1, |) vioete D) < LylogN . 


k>1 


Using again (19.23) for the sequences th = xy+z (1 < k <n — M), we now obtain 


N N 
g(T') < Lsup{ ,/ ——-Jlog(k + 1)]) < L,/—logM <L. Oo 
wi WV Mk Vu 


In the next statement, we define T = {+x ,...,+x,}, and for a subset J of 
{1,...,}, we define T! as the collection of elements x; for i outside I, 
Tl ={4txuj;i<n,i¢I}. (19.26) 


We are now ready to state our new comparison principle between Gaussian and 
Rademacher averages. 


Theorem 19.1.11 We have 


El do sixil] < LE| Does] + (7). (19.27) 
ix<n i<n 
More generally, for any subset I of {1,...,n}, we have 


Ell o;- giXi|l 
E ixi|| < LE ax ( 1+ nate ar). 19.28 
| Xs Xenl( ES gm ee (19.28) 


When I = @ (19.28) specializes into (19.27). Using (19.24), we see that (19.27) 
generalizes the classical inequality 


El >> gixi| < LVlog NE|| Yo eixi| . (19.29) 


i<n i<n 


Let us also stress that in (19.28), the quantity g(T") is computed as in (19.22), that 
is, for the norm || - ||2 on X involving the whole sequence (x;)i<n and not only the 
(xi )igr- 

We will prove Theorem 19.1.11 later. First, we draw some consequences. 
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Corollary 19.1.12 There exists a subset I of {1,...,n} such that cardI < NlogN 
and that either of the following holds true: 


1 
E| > six = SEI Dail (19.30) 
igl i<n 
or else 
E| >o sixi] < LE| > oeiail - (19.31) 
igI idl 


Proof The set J = {1,..., M} satisfies g(T!) < L by (19.25) so that if (19.30) 
fails, (19.31) follows from (19.28). oO 


Corollary 19.1.13 We can find elements y,,..., yy of X such that 


MED esl <(Cwopr)” (19.32) 


L : 
J<M J<M 


and M < N log N loglog N. 


We have obtained (19.18) for M < N log N log log N, that is, “M vectors suffice to 
compute the Rademacher cotype-2 constant of U”. 


Proof We find elements x1, ..., x, of X such that 
C3(U) 1/2 
El Del < (ue) (19.33) 
i<n i<n 


The next step of the proof consists of showing that we can find a subset J of 
{1,...,} with card J < N log N loglog N such that 


El > gixil| < LE| Devil - (19.34) 


i¢gJ i<n 


To this aim, consider the largest integer kg with Qo < log N so that ky < 
log log N. By induction over k, for k < ko, we construct subsets J, of {1,...,n} 
with card J, < N log N and either 


1 
El dD) gals 5El Do ail (19.35) 


igV...UDK i¢NQU...UDR-1 
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or else 


El do simfszel dO aia. (19.36) 


i¢NU...UD, ig] V...UG_1Uz 


The induction step is performed by using Corollary 19.1.12 for the set of indices 
{i <n;i ¢ I, U...U Ie_1} rather than the set {1,..., }. If at the k-th step (19.36) 
holds, we then stop the construction, and we define J = 1; U...U Jz. Thus M := 
card J < kN logN < koN log N and 


El d gixi| = LE Demi] < LE| Dl eixil 


idJ idJ i<n 


so that (19.34) holds. If instead (19.36) never occurs during the construction, we 
continue this construction until k = ko, and we define now J = J, U...U i. Thus 
M :=cardJ < koN log N and, iterating (19.35), 


E| > gail = 2-PE| DS six. 


i¢gJ i<n 


Combining with (19.29), this implies 


E| Y> gixi | < 2-1 flog NE| > am 


i¢J i<n 


> 


and this proves (19.34) by the choice of ko. 
Now that we have proved (19.34), we consider two cases. 


Case 1. We have 


Yolo @ll? = >> IUG@aIP . (19.37) 


ieJ ix<n 


Then, using (19.33) in the third inequality, 


GO eI en | < SPE] Pail 
ieJ i<n 
1 1/2 1/2 
<5(Liveoe) = (Ler) ; 


and this proves (19.32).° 


6 It is certainly disturbing at first that this case does not use at all any of the previous work. The 
point is that (19.37) is very unlikely to hold. 
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Case 2. (19.37) fails so that we have 


1 
lv@nr’ = 5 w@al? 


i¢gJ i<n 


Then (19.33) yields 


te] Dan| <( vent)” 


i<n i¢gJ 


and combining with (19.34), we obtain 


xi < (Swear) (19.38) 


igJ i¢gJ 


Cy “ 


which implies that the Gaussian cotype-2 constant C3(U) of U is > Ci(U)/L. 
It is proved in [137] that the Gaussian cotype-2 constant Cc (U) of U “can be 


computed on WN vectors”, so that we can find N elements yj,..., yy of X such 
that 
Cs “ /2 
E| S sii (Xo wopr)” (19.39) 
J<N j<N 


Using (6.6), we have E]| i<n ejyjll < LE jen gjyjl| so that (19.39) 
implies (19.32) since C3(U) = Ci(U)/L. oO 


We turn to the proof of Theorem 19.1.11. We fix a set J C {1,...,m}, and we 
recall the set T/ of (19.26). We consider the set 
Vi = {(x* i) igs 3 x* € XTC RR’, 


where /° = {1,...,n}\ J and Xf is the unit ball of X* for the orginal norm. On 
V;, we consider the distance do, induced by the supremum norm on R/ “. The key 
step is the following: 


Lemma 19.1.14 We have 


Vi(V1, doo) < Le(T")El| So gixill . (19.40) 


i<n 
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Proof of Theorem 19.1.11. By duality, we have 


g(V1) =El]| )> gixill ; 6) = El Do avail . (19.41) 
idl ig] 


We appeal to Theorems 6.6.1 and 2.7.11 to obtain 


g(V7) < L(b(V1) + Vb (V1, doo)) = L(b(V1) + Vg(VIC) , 


where 


b(V7) 7 Ell Deen Bill 
C= VV, do) < LLVi)g(T = 
g(Vi) i Ell Vigr gixill 


where we have used (19.40) and the first part of (19.41) in the inequality. 
Using that for any c > 0 we have the inequality ./xy < cx + y/c, we conclude 
that 


1 
8(V7) < Lb(V7) + 58M) tL , 


so that g(V;) < Lb(V;) + LC, which, recalling (19.41), is the desired inequal- 
ity (19.28). oO 


It remains to prove Lemma 19.1.14. The proof of this lemma involves several 
ingredients which have to be combined in an unusual way. One of them is the 
following general principle, where we recall that No = 1 and that N, = 2?" for 
n> 1: 


Lemma 19.1.15 Consider a set W provided with two distances dz and d. Assume 

that for a certain number S and every integer n > 0, every number a > 0, every ball 

Ba, (t, a) of W can be covered by Ny sets of d\-diameter at most aS2-"/?. Then 
vi(W, di) < LSy2(W, do) . 


Proof Consider an admissible sequence (6,) of W with 


vee W, > 2"? A(Bn(t), d) < 2y2(W, do) . 


n>0 
We construct by induction an increasing sequence of partitions (C,,) satisfying 
cardCy < Nn+2 (19.42) 


YC €C,, 3B E Bi, CCB, A(C,d}) < S27" A(B, d) . (19.43) 
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First, we set Co = {W}. We note that using the hypothesis for a = A(W, d2) and 
n = 0, we have 


A(W, d\) < SA(W, dz). (19.44) 
Thus (19.43) is true forn = 0. Assuming that C, has been constructed, we split each 
element C of C,, as follows: First, we split C into the sets CN B, B € By41. Then 
we split each set CN B into N,+41 pieces C’ such that 
A(C’, dy) < S27") ACN B, a2) . 
This is possible by hypothesis, and this completes the construction of C,,1. Clearly, 
Cn+1 consists of at most N,+2 - Nov = N43 sets, and it is obvious that (19.42) 


and (19.43) hold form + 1. A consequence of (19.43) is that 


Vt, A(Cn(t), di) < S27"/7 A(Bn(t), d2) 


and thus 
Y5 2" A(Cn(), di) < S52" A(By(t), d2) 
n>0 n>0 
< 2Sy2(W, d2) . 
Using (19.44) and Lemma 2.9.10 then yields the result. Oo 


Proof of Lemma 19.1.14, Let us denote by X7 the unit ball of X* and by dy the 
distance associated to the norm || - |/2. By (19.1), the process given for x* in 
Xt by Xy* = ee gix*(x;) satisfies SUPy*ex* ae gix* (xj) = || Dis gixi|l. 
Theorem 2.10.1 yields 


y(X}, do) < LE|| )~ gixi| . (19.45) 
i<n 
Consider the norm || - ||; on X* given by 
IIx"l]1 = sup |x*(xj)| = sup x*(x) , 
i<ni¢l xeT! 


where T! = {4x;;i <1n,i ¢ I}. Lemma 19.1.9 asserts that the unit ball of (X%, || - 
l|2) can be covered by N,, balls for the norm || - ||; of radius Lg(7!)2-"/2. Denoting 
by d the distance associated to the norm || - ||}, Lemma 19.1.15 then implies 


yi(X¥, di) < Lg(T")y2(XF, do). 


Combining with (19.45) completes the proof since obviously 1(V;,doo) = 
yi (Xf, di). Oo 
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19.2 Unconditionality 


19.2.1 Classifying the Elements of By 


Consider a general o-finite measure space ({2, 2), and 


B= (Fe Lu); fitian =a). 


Theorem 19.2.1 below provides a kind of classification of the elements of By. It is 
at the root of Proposition 10.14.3. It will be used a number of times in the following 
sections, allowing us to gain an excellent control of the subsets T of By which are 
small in some other sense, for example, y2(T, d2) < oo. It has no content when jz 
is a probability and is of interest only in the case where the total mass of pu is large. 
The parameter t below is of secondary importance, and one may assume Tt = 0 at 
first reading. We recall the notation a A b = min(a, b) . 


Theorem 19.2.1 For any integer t € Z, there exists an admissible sequence of 
partitions (C,) of Bi, and for each C € Cy, an integer £n(C) € Z, such that if we 
set 


€(f,n) = ln(Cal(f)) (19.46) 


where as usual C,(f ) denotes the element of Cy containing f, we have 


VfeEB,, (ey Alda< 2" . (19.47) 
and 
VP SB, ) ON” 2 18.0*, (19.48) 
n>0 


We note that (19.47) implies 
VPeBr, ohigle2 VP yar (19.49) 


A first (and partial) understanding of the meaning of this result is that it classifies 
the functions f of B, according to the values of the integers €(f, n) for which 


u({If = 2a) ~ gntt . 


The effectiveness of this result will be understood through multiple applications, the 
first of which is a proof of Proposition 10.14.3 at the end of the present subsection. 
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Lemma 19.2.2 For any number a € R, we have 


yer a) A2-* < 8lal. (19.50) 
keZ 


Proof Without loss of generality, we assume that a > 0. Consider the largest integer 
ko such that 2a < 1 so that 2"+!@ > 1. Thus, 2"°t7q? < 4a,2~"+! < 4a and 


Verne sy ey a eo ete. 
keZ k<ko k>ko 


Lemma 19.2.3 Given f € B, andn > 0, we define €(f,n) as the largest integer 
<2n-+ 1 for which 


/ (226) £2 A 1d <2", (19.51) 
Then 
yor tint < 18, (19.52) 
n>0 


Proof Let us consider the set J(f) = {n; €(f,n) < 2n+ 1}. Then, forn € J(f), 
we have 


[omar A Idu > gntt , 
and therefore 


[Omer A 2M du > gn-l(fin)+t : (19.53) 


It is obvious by construction that the sequence (€(f, 1))n is non-decreasing in n. For 
k € Z, we define A (f) = {n € J(f); €(f,n) =k}. Let 1(f) = {k eZ Af) F 
0}. It follows from (19.53) that when k € I(f), Jz(f) has a largest element n;, and 
then, using again (19.53) in the last inequality, 


a nl fant = > gn-k+t < gnctl—k+t < 2 f alt? 9) A oO fdiz . 
ned (f) ned f) 


Summing the previous equations over k € I(f) and using (19.50), we obtain 


a Pima CLS ame | > (OF? Py KI de = 16 | | fldu < 16. 
neJ(f) kel(f) 


The result follows since 7,23 f) QTY ey ge ee o 
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Proof of Theorem 19.2.1. We define €(f, n) as in Lemma 19.2.3. Since h?A1 < |h| 
and since f € By, 


forms Id < aren P| fldu <2" , 


and the definition of £(f, n) implies €(f,n) > n+ and therefore t+n < €(f,n) < 
t + 2n so that €(f,n) can take at most n + 1 different values. We define Co = 
{B,}, and €9(B,) = t. Consider the partition C, of By induced by the following 
equivalence relation: f and f’ are equivalent if and only if €(f, m) = €(f’, m) for 
each m <n. The sequence (C,) is increasing. Moreover, since €(f, m) can take at 
most m + | values, and since the values of €(f, m) form < n determine to which 
element of C,, the function f belongs, 


cardC, < (n+ 1)! < Mn, (19.54) 


so that the sequence (C,,) is admissible. 

By construction of Cy, for f € C € Cy, £(f, n) has a fixed value €,,(C), that is, 
f €C, we have €,(Cn(f)) = €n(C) = L(f, n) so that (19.46) holds. Also (19.47) 
holds by construction. Finally (19.48) follows from (19.52). oO 


As the crude inequality (19.54) shows, the use of admissible sequences is not 
really canonical for a “classification result” such as Theorem 19.2.1 (one could 
consider sequences of partitions with a much smaller cardinality). This, however, 
suffices for the applications, and we have not yet found uses for sharper results. 


Proof of Proposition 10.14.3. We have to produce an admissible sequence of 
partitions (A,,) of Bz and forn > 0 and A € A, an integer j,(A) satisfying 
the conditions of Definition 10.14.1, where the quantity S of (10.170) is < 
Lar. Consider the admissible sequence of (C,) of B, obtained by application of 
Theorem 19.2.1 with t = —2 (when wp is the counting measure on N). Consider 
the bijection f +» af between By, and By. Define the admissible sequence (A,) 
of partitions of Bg, consisting of the sets aC for C € C,. Thus, if A € An, 
then A/a € C,. Define jo(Bq) as the largest integer with 2a < r—40(Ba) (so 
that (10.169) holds). Forn > 1 and A € Aj, define €),(A) = €,(A/a) (recalling 
that A/a € C,,). Define j,(A) as the largest integer for which arin(A) < QnA), so 
that r~Jn(A) < ar2~"A), Tt then follows from (19.52) that for f € Bag, we have 
pee 2"r—dn(An DP) < Lar so that as desired, the quantity S of (10.170) is < Lar. 
Also, for f € A, we have f/a € A/a so that, using (19.47) in the last inequality, 


Calf O= > COM Ry Ale yy CORP Gay M122". 


i>1 i=l 


Since the function gj; is the square of a distance, we have j,(4)(f, g) < 2” for 
J. g € An, and this proves (10.168). oO 
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19.2.2 Subsets of By 


To lighten notation, we write 


1 


an = eee .. 
"" flog(n + 1) 


To understand the main result of this section, Theorem 19.2.4 below, we have to 
keep in mind the following consequence of Theorem 2.11.9: 


(19.55) 


A set T with 0 € T and y2(T, d) < 1 is (basically) a subset 

of the convex hull of a sequence (x,) with ||xp|| < dn . (19.56) 
This is really a structure theorem, giving in a sense a complete description of the 
sets T with y2(T, d) < 1. When furthermore T C B; = {y € 0: Det Iyil < 1}, 
we will obtain a much more precise description of T. Given a finite subset J of 


N* = N \ {0}, and a number a > 0, we define B2(/, a) as the set of elements of 2 
with support in J and with é? norm < a, that is, 


Bo(I,a) = {x eR ; idl —>x;=0; ee (19.57) 
iel 


Theorem 19.2.4 Consider a subset T of ¢*. Assume that for a certain number S, 
we have y2(T,d2) < S and T C SB. Then there exist sets I, C N* such that 
card I, < log(n + 1) with 


T CLS com |_) Bo(In.an) , (19.58) 


n>1 


where conv A denotes the closed convex hull of A. 
We start by some simple observations. 


Lemma 19.2.5 Consider sets I, C N* such that 
Vn >1, cardl, <logv+1). (19.59) 


Consider independent standard Gaussian rv.s (gi)j>1. Then 


1/2 
E supan( ?) coe oe (19.60) 


nz1 i€ln 
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Proof For each i, we have E exp(g/4) < 2 so that for any set /, 
E 1 2 card I 
co(i De) <2" 
L 


and, for v > 8 card I, 


P(>os? =») < 21! exp(— 2) < exp(- =). 


iel 


Now, (19.59) implies that for w2 > 8, we have, using the value (19.55) of ay, 


P( supan( > ¢?)" > w) < a P(>o# > w* log(n + Dy) 
n>1 


i€l, n>1 iél, 
2 
w* log(n + 1) 
< a 
= ien(-———). 
n>1 
and the last sum is < L exp (—w7/L) for w large enough. Oo 


Exercise 19.2.6 If the sets J, satisfy card I, > log(n + 1), prove that 


I ; 1/2 
E : ae Sie 
w(sabe) = 


Exercise 19.2.7 Find another proof of Lemma 19.2.5 by constructing a sequence 
(ux) of €7 with ||ux||2 < Lag and 


U BoUn, an) C conv{uz, ; k > 1}. 


n>1 


Hint: Recall Lemma 14.3.2. Use this for each ball B2(/, a) in the left-hand side 
above. 
Exercise 19.2.8 We recall that for T Cc ¢7, we write 


g(T) = Esup X; = Esup tigi : 
teT 


teT i>1 


Consider subsets 7, of £2, and assume that for certain numbers b,,, we have |x |l2 < 
b, for x € T,. Prove that 


a( U Tn) <L sup (¢Tn) + bn log(n + D) (19.61) 


n>1 
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Use (19.61) with T, = B(a,,I,) and by = ad, to give another proof of 
Lemma 19.2.5. 


The following is a kind of converse to Theorem 19.2.4: 


Proposition 19.2.9 Consider sets I, C N* with card I, < log(n + 1). Then the set 
Ti := conv > 1 BoUhn, an) satisfies y2(T,, dz) < L and T; C By. 


Proof It follows from the Cauchy-Schwarz inequality that Bz(I,, an) C By, so that 
T; C B,. It follows from Lemma 19.2.5 that g(7;) < L and from Theorem 2.10.1 
that y2(7|, dz) < L. Oo 


Thus, Theorem 19.2.4 in a sense provides a complete description of the sets 
T C B, for which y2(T, dz) < 1. 


Proof of Theorem 19.2.4. By homogeneity, we may assume that S = 1. We denote 
by A2(A) the diameter of A for the distance dz induced by 7. Since y2(T, dz) < 1, 
we may consider an admissible sequence (6,),>0 with 


sup ) 2"? Ao(Bn(t)) <2. (19.62) 


teT 150 


Next, we take advantage of the fact that T C By. We consider the admissible 
sequence (C,,) provided by Theorem 19.2.1 when t = 0, 2 = N*%*, and w is the 
counting measure. We consider the increasing sequence of partitions (An )n>o where 
Ay, is generated by 6, and C,, so card A, < N,+1. The numbers ¢(t, 1) of (19.46) 
depend only on A,,(t). Therefore, 


S€An(t) > C(s,n) = C(t, n). (19.63) 


For every A € An, we pick an arbitrary element v,(A) = (Un,)(A))i>1 of A. We 
set 


In(A) = {i EN*; Jon i(A)] > 2M], 
so that card J, (A) < 2” by (19.49) since yu is the counting measure, and we define 
Ji(A)=(J {eB k <n, BEA, ACB}. 
Forn > land A € An, we set I,(A) = Jn(A)\J;(A), so that card I,(A) < 2”. We 


define Io(T) = Jo(T) and F as the family of pairs (In(A), 2~"/) for A € A, and 
n > 0. The heart of the argument is to prove that 


T C Ltonv U Bo(I,a) . (19.64) 
U,ajyeF 
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To prove this, let us fix t € T and forn > | define 


In(t) = In(An()) = Jn(An() \ U Jk(Ak(t)) (19.65) 


k<n 


and z,(t) = Up (Ay (t)). Since z(t) € Ap (t), it follows from (19.63) that 
L(mpy(t),n) = C(t, n) . (19.66) 


Thinking of a point t € ¢ as a function from N* to R, for J c N*, we consider the 
element f1, € ¢7 given by 


tly = (11 @)iz1 - (19.67) 
For i € I,(t), we have i ¢ Jy—-1(An—1(f)) so that by definition of this 
set |Un—1,i(An-1(0))| S 2-#@.n-D. Since Tn—1(t) = Un—1(An-1(t)), we have 
|7tn—1(t)i | = |Un—1,7(An—-1(0)| < 278"). We have proved that 
I%n-1)1z,@)lloo < 2-8"), (19.68) 
so that ||7,-1(t)17,@ll2 < 2n/2—€0.n-l) since card In(t) = card In(An(t)) < 2”. 
Since t, %—-1(t) € An-1(t), we have ||t17, (4) — M—-1(1) 1, yll2 < Wt -— 2-1 Oll2 
A2(Ay_1(t)) and thus, 
IItLs,@ll2 < c(t, 2) = Ar(An1(f)) + 277 -OD , (19.69) 
Therefore, 


#174) € 2°? eC, n) Ban @), 2") « (19.70) 


For each t € T, we define c(t,0) = 1. Since t € T C B, and card Jo(T) = 
card Ip(t) < 2° = 1, (19.70) also holds for n = 0. We claim now that 


i= Yo tlw (19.71) 
n>0 
We first show that 
lj] >O0>i7¢€ U Jn(An(t)) . (19.72) 
n>0 


To prove this, consider 7 with |t;| > 0 and n large enough so that Ao(A,(t)) < 
|t;|/2. Then for all x € A,,(t), we have |x; — t;| < |t;|/2 and hence |x;| > |¢;|/2. 
Recalling that (19.48) holds for t = 0, we have in particular 2n-&x.0) << 18 < 2° so 
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that £(x,n) > n —5, and for n large enough, for all x € A,(t), we have gen) 2 
|x;|. This holds in particular for x = a,(t) = vp(An(t)). Thus, by definition of 
J, (A), this shows that i € J; (An(t)). 
It follows from (19.72) that if |t;| > 0, there is a smallest n > O such that 
i € Jy (An(t)). Ifn = 0, then? € Jo(T) = Io(T). Ifn > 0, then (19.65) implies that 
i € I,(t) and that furthermore, the sets [,(t) are disjoint. We have proved (19.71). 
Combining (19.71) and (19.70), we have 


t= thw =) 2"7e(t, nun) , (19.73) 


n>0 n>=0 


where u(n) € Bo(I,(t),2~"/*). Furthermore, )~,.9 2”/7c(t,) < L by (19.62) 
and (19.48), so the relation (19.73) proves (19.64). 7 

It remains to deduce (19.58) from (19.64). This tedious argument simply requires 
a cautious enumeration of the pairs (J,a) € F as follows. Consider the set Z,, 
consisting of all the sets of the type /,(A) for A € A, so that cardZ, < card A, < 
Nn+1- We then find a sequence (J,)x>1 of sets with the following properties. First, 
Ik = Dif k < No. Next, forn > 0, Zn = {Iki Nai < k < Nn42}.’ This is 
possible because card Z, < Nn+1 < Nn+2 — Nn+1- Furthermore, card i, < 2” for 
Nn+1i <k < Nn42 since card < 2” for I € Ty. 

Thus for Nn41 < k < Ny+2, we have 


card i < 2” < 2"*! log2 = log Nn41 < log(k +1) < 2"*?. (19.74) 


This proves that for all k, we have card I, < log(k + 1). Consider now (U/, a) € F. 
We prove that for some k, we have J = J, anda < Lax, which obviously conclude 
the proof. By definition of F, there exists n > 0 such that J € Z, anda = 2-n/2 go 
that by our construction 7 = J; for some k with Ny41 < k < Ny+2 anda = Q-n/2 
satisfies a < 2//k + 1 = 2ax by the last inequality of (19.74). Oo 


Numerous relations exist between the following properties of a set T: 
yo(T,d2) < 1; T C Bi; w1(T, doo) < 1 (where doo denotes the distance 
associated with the supremum norm). We started exploring this theme in Chap. 6. 
For example, the essence of Theorem 19.2.10 below is that the conditions T C By, 
and y1(T, doo) < 1 taken together are very restrictive. We pursue this direction in 
the rest of this section, a circle of ideas closely connected to the investigations of 
Sect. 19.3.1 below. 

For J c N* anda > 0, in the spirit of the definition 19.57 of Bo(/, a), we define 
Boo(/, a) as the set of elements of support in J and of £° norm < a, that is, 


Bela) = (x= Gist ici Sa Hier a) Sat. (19.75) 


7 The sets J, are not required to be all different from each other. 
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We have 


x € Bo(U,a) > ye < a’ card I 


i>1 
and thus, recalling the sets Bo (/, a) of (19.57), this implies 


BooU, a) C Bo, av card) . (19.76) 


Theorem 19.2.10 Consider a set T C SB, and assume that y\(T, doo) < S. Then 
we can find subsets I, of N* with card I, < log(n + 1), for which 


1 
T Cc LS conv Bool In, ———) (19.77) 
log(n + 1) 


n>1 


Proof Replacing T by T/S, we may assume that S = 1. We proceed as in the proof 
of Theorem 19.2.4, but we may now assume 


WteT, ))2"Aco(An(t)) <2. 


n>0 


Using (19.68) rather than (19.69), we get 
It, alloc < c(t, 2) == Ago(An=1 (1) +2") 
so that 
tly) € 2" c(t, n) BooUn(t), 2°") 


and the proof is finished exactly as before. Oo 
Corollary 19.2.11 [fT C SB, and y\(T, doo) < S, then y2(T, dz) < LS. 


Proof Indeed (19.76) and (19.77) imply that T Cc LS conv oes BoUn, Gn), and 
Lemma 19.2.5 shows that this implies that y2(T, d2) < LS. oO 


The information provided by (19.77) is however very much stronger than the 
information y2(T, d2) < S. 


19.2.3. 1-Unconditional Sequences and Gaussian Measures 


Definition 19.2.12 A sequence (e;);<y of vectors of a Banach space X is 1- 
unconditional if for each numbers (a;);<y and signs (€;);<1 we have 


| Doaieil =| Qo eiaier|| - 


i<N i<N 
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Exercise 19.2.13 Prove that a sequence (e;);<y in R” provided with the Euclidean 
norm is |-unconditional if and only it is orthogonal. 


There are many natural norms on RY for which the canonical basis (e;)i<n is 1- 
unconditional, for example, the norms || - ||, for p > 1 given for x = (x;)j<n by 
I|x ID = ))j<y |xi|? or the norms given by the formula (19.82) below. 

The main result of this section is Theorem 19.2.15 below. It is in a sense a dual 
version of Theorem 19.2.4. Our next result, which does not involve unconditionality, 
provides perspective on this result. It is in a sense a dual version of (19.56). We recall 


the notation a, := 1/,/log(n + 1). 


Theorem 19.2.14 Consider elements (e;)i<n of a Banach space® X and § = 
Ell o;<y gieil|. Then we can find a sequence x* € X* such that for eachn > 1, we 
have 


1/2 
(oxten) <n (19.78) 
i<N 
and 
Wx EX, |x|] < LSNGO), (19.79) 


where N(x) := sup,s1 |x; (x)|. 


The point of this result is that, using Proposition 2.11.6, 


ESN’( 7 siei) <LS= LE| y Rie; | : 


i<N i<N 


In words, given a norm || - ||, if we are only interested in the quantity S$ = 
E|| }°; <n giei||, our norm in a sense does not differ from a Jarger norm (see (19.79)) 
of the type SN, where N(x) = sup, |x*(x)| for a sequence (x*) that sat- 
isfies (19.78). Thus, Theorem 19.2.14 is a structure theorem, just as its dual 
version (19.56). If El] }°;—y gieil| is the only characteristic of the norm in which 
we are interested, we basically have a complete understanding of this norm. 


Proof of Theorem 19.2.14. Denoting by x* an element of the dual X*, consider the 


set of sequences 


T ={@*(e))ien; llx* s UCR’. 


8 The reason for the change of notation is the results of this chapter have a natural extension when 
the finite sequence (e;);<y is replaced by an infinite sequence which is a basis of X. We refer to 
[132] for this. 
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As usual, for a sequence t = (t;)i<n € RY, we write X; = Dien t; gi. Thus, 


S= E| \- giei | =E sup So x* (ei) gi =Esup X;. 
i<N Ix*IS1 jen teT 
Consider then a countable subset T’ of T such that 


sup | bas > (tj) € | = sup { ». tix; , (ti) € r'| , (19.80) 


i<N ix<N 


We apply Theorem 2.11.9 to T’ to obtain a sequence y, = (yn,i)i<n With ||yn|l2 < 
ad, and T’ Cc LSconv{y,;n > 1} and where y, is moreover a multiple of the 
difference of two elements of 7’. Thus, 


sup { So tix > (ti) € r'} < LS sup { Yo yn ii ,n> i} : 


i<N i<N 


Since y, is a multiple of the difference of two elements of T,, there exists x7 in X* 
with y, = (x; (e))i<n, that is, yp; = x7 (e;). Thus (19.78) follows from the fact 
that || yp |l2 < a). Moreover, when x = en xje;, we obtain from (19.1) that 


lll = sup { Yo x*Cer)xrs et s 1] = sup | oars @ eT} 


i<N i<N 
= sup | Ye ; ae r"| < LS sup { Yo yn ii ,n> i| 
i<N i<N 


228 sup | Yo xt(eixi n> i} = LS sup x*(x) < LS sup Ix*(x)] 
n= 


i<N n>1 


This proves (19.79) and finishes the argument. Oo 


Suppose now that the sequence (e;);<y is 1-unconditional. Then Theo- 
rem 19.2.14 is not satisfactory because the sequence (e;);<y 1s not 1-unconditional 
for the norm N produced by this theorem. We provide a version of Theorem 19.2.14 
which is adapted to the case where the sequence (e;);<y is 1-unconditional. 


Theorem 19.2.15 Consider a 1-unconditional sequence (e;)i<n in a Banach space 
X, and let S = E|| Dien gie;||. Then we can find a sequence (I,) of subsets of 
{1,..., N} satisfying (19.59) and 


1/2 
Vx EX, x=)  xe7, |Ixl] < LS supan( J)x?) (19.81) 
qn 


i<N 121 i€In 
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To explain the meaning of this result, let us assume that the sequence (e;);<N 
spans X, and when x = Vyen x;e;, let us define the new norm 


1/2 
N(x) = sup an ( > #) ; (19.82) 


nz i€In 


This sequence (e;);<y is 1-unconditional for this norm, and (19.81) implies ||x|| < 
LSN (x). Moreover, Lemma 19.2.5 implies that EV (7; giei) < L. In words, 
given the 1-unconditional sequence (e;);<y, if we are only interested in the quantity 
S = E|| 0;-y giei||, we can replace our norm by a Jarger norm of the type SN, for 
which the sequence (e;)i<n is still 1-unconditional. Again, this should be viewed as 
a structure theorem. 


Exercise 19.2.16 In the statement of Theorem 19.2.15 prove that one may instead 
request card J, > log(1 + n) and replace (19.81) by 


1 1/2 
< LS ? : 
Ill s es (= I, >) 


i€ly 


We start the proof of Theorem 19.2.15 with a simple observation. 


Lemma 19.2.17 Consider a 1-unconditional sequence (e))i<n and S = 
Ell )ois1 gieill. Then the set 


T = {(x*(ei))icn 5 x* € X*, [[x*| <1} CRY (19.83) 
satisfies 
VyeT, » wla2s. (19.84) 
i<N 


Proof Denote by n; the sign of g;x*(e;) so that 


Ye lgilla* (el = DS lx*(eiei)| = D5 nix*(giei) 


i<N i<N i<N 
= “(yo nigiei) <| > nigiei|| = | Y= giei | . 
i<N i<N i<N 
Taking expectation completes the proof since E|g;| = ./2/7 > 1/2. oq 


Proof of Theorem 19.2.15. We recall the set T of (19.83). Lemma 19.2.17 implies 
that T C 2SB,. (This is the only place where the fact that the sequence (e;);<y is 
unconditional is used.) Moreover, Theorem 2.10.1 implies that y2(T, d2) < Lg(T), 
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whereas 


e(T)=E sup x*( DY giei) = E| sei =S. 


]x*||<1 i<N i<N 


Theorem 19.2.4 provides sets I, that satisfy (19.59) and T C LST), where 


T, = conv U BoUy, Gn) - 


n>1 


Thus, by duality, if x = )°; <n Xiei, We have, using the Cauchy-Schwarz inequality 
in the last step, 


\|x || = sup > tix; < LS sup e tx; < LSsupay, sup > tix; 
teT j<n tell j<n n>1 teBoUn.dn) jn 


1/2 
LS supa, sup x t)x; < LS sup dy ( ys #?) 


n>1 t€BoUn.dn) je, n>1 ic I, 


and this proves (19.81). oO 


The following exercise is similar to Theorem 19.2.15 but for r.v.s with exponen- 
tial tails rather than Gaussian: 


Exercise 19.2.18 Assume that the r.v.s Y; are independent and symmetric and 
satisfy P(|¥;| => x) = exp(—x). Consider a 1-unconditional sequence (e;)j<y in 
a Banach space E, and let S = E|| )>,., Yiei||. Prove that we can find a sequence 
(In) of subsets of {1,..., N} with card J, < log(n + 1) and 


1 
VxeE,x= xiei , ||x|| < LS sup ————— |xi| . 
> < ACE SIPS 


Hint: Use Theorem 8.3.3. 


19.3. Probabilistic Constructions 


To prove the existence of an object with given properties, the probabilistic method 
exhibits a random object for which one can prove through probabilistic estimates 
that it has the required properties with positive probability.’ 


9 There are many situations where this method applies but where one does not know how to exhibit 
any explicit object with these properties. 
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19.3.1 Restriction of Operators 


Consider gq > 1, the space é7, and its canonical basis (e;)i<n. Consider a Banach 
space X and an operator U : a — xX. We will use many times the trivial 
observation that such an operator is entirely determined by the elements x; = U (e;) 
of X. Our goal is to give in Theorem 19.3.1 below (surprisingly mild) conditions 
under which there are large subsets J of {1,..., N} such that the norm ||U;|| of the 
restriction U; to the span of the elements (e;)j<-7 is much smaller than the norm of 
U. We first compute this norm. We denote by X7 the unit ball of the dual of X, by 
p the conjugate exponent of q. Setting x; = U(e;), we have 


Wy = sup {x"(P ain) Srlail’ <1, xe xi (19.85) 


ieJ ieJ 


= sup (x waar)” ; x*e xi] . 


ieJ 


The set J will be constructed by a random choice. Specifically, given a number 
0 <6 < 1, we consider (as in Sect. 11.11) 1.i.d. rv.s (6;);<N with 


and we set J = {i < N; 6; = 1}. Thus (19.85) implies 


sll? = sup ) > dsileil?, (19.87) 
teT jen 
where 
T = {(x*(xi))iew 5 x* € XP CRY. (19.88) 
Setting 


IT}? = {(a#lPiew 3 t€T} CRY, 


we may rewrite (19.87) as 


Us? = sup Yo biti. (19.89) 


Py 
te|T| i<N 


This brings forward the essential point: To control E||U,||?, we need information 
on the set |T|?. However, information we might gather from the properties of X 
as a Banach space is likely to bear on T rather than |T|?. The link between the 
properties of T and |T|? is provided in Theorem 19.3.2 below, which transfers a 
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certain “smallness” property of T (reflected by (19.96) below) into an appropriate 
smallness property of |T |? (witnessed by (19.99) below). 

Let us start with an obvious observation: Interchanging the supremum and the 
expectation yields 


E|Us|I? = supE( D> dilni|”) = dsup > |nil? . (19.90) 
teT 


i<N teT i<n 


This demonstrates the relevance of the quantity 


sup) |ti|? = sup So |x*Qx)I? (19.91) 


teT jon Ix*IIS1 joy 


We can think of this quantity as an obstacle to making E||U,||? small. It might be 
sometimes to our advantage to change the norm (as little as we can) to decrease this 
obstacle (of a somewhat uninteresting nature). For this, given a number C > 0, we 
denote by || - ||c the norm on X such that the unit ball of the dual norm is (bearing 
in mind that ||x*|| is the dual norm of x*) 


Migalsex elec y We? sc}. (19.92) 
i<N 


and we denote by ||U||c the operator norm of U when X is provided with the norm 
| - llc. This definition is tailored so that for the norm || - ||c, the quantity (19.91) is 
now < C. Another very nice feature is that the set 


To = {(x*(xi))ien 3 x* € XP co} CRY (19.93) 


of (19.88) corresponding to the new norm is a subset of the set T = 
{(x*(xj))i<nw 3 x* € Xf} corresponding to the original norm. We will then be 
able to prove that Tc is small in the sense of (19.96) below simply because T is 
already small in this sense. This will be done by using the geometric properties 
of the original norm, and we shall not have to be concerned with the geometric 
properties of the norm || - ||c. 

We are now ready to bound the operator norm of a random restriction Uy of U. 


Theorem 19.3.1 Consider 1 < q < 2 and its conjugate exponent p > 2. 
Consider a Banach space X such that X* is p-convex (see Definition 4.1.2). Then 
there exists a number K(p,n) depending only on p and on the constant n in 
Definition 4.1.2 with the following property. Consider elements x,,...,xn of X, 
and S = maxj<n ||x;||. Denote by U the operator Li — X such that U(e;) = x;. 
Consider a number C and define B = max(K (p, n)S? log N, C). Assume that for 
somee > 0 


$2 — SS 1., (19.94) 
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Consider rv.s (6;)i<n as in (19.56) and J = {i < N; 6; = 1}. Then 


SP 
E|Us\|2 < K(p, nm - (19.95) 


It is remarkable that the right-hand side of (19.95) does not depend on ||U||c 
but only on S = maxj<y ||U(e;)||. In the situations of interest, S will be much 
smaller than ||U||c so that (19.95) brings information. The condition (19.94) is not 
very intuitive at first, but the reader will find in Lemma 19.3.9 below two specific 
examples of application. 

There are three steps in the proof. 


¢ We use geometry to show that the set T of (19.88) “is not too large”. This is 
Theorem 19.3.4 below. 

¢ We transfer this control of T to |T|?. This is Theorem 19.3.2 below. 

¢ The structure result obtained for |T|? in the previous step is perfectly adapted 
to obtain a statement of the same nature as Theorem 19.3.1 and Theorem 19.3.3 
below. 


We first perform the second of the previous steps, which is closely related to 
Theorem 19.2.10. We recall from (19.75) that for a subset J of {1,..., NM} and for 
a > 0, we write 


Bo(I,a) = {icy i¢ 14 =0, Vi ET, ti] sa} CRY. 


Theorem 19.3.2 Consider a subset T of RN with 0 € T. Assume that for a certain 


number A > 0, there exists an admissible sequence (B,) of T such that!® 
weT, ))2"A(By(t), doo)? < A (19.96) 
n>0 


and let 
B = max (4. sup >> I”) (19.97) 
teT i<N 
Then we can find a sequence (Ix) x>1 of subsets of {1,..., N} with 
LB 
card I, < ae log(k+ 1), (19.98) 


'0 Tn the language of the functionals yy, p of (4.5), the following condition basically states that 
Yp.p(T, doo)? < A. 


634 19 Applications to Banach Space Theory 


and 


IT|? C K(p)A conv |_) Boo( Ik, 
k>1 


a 19.99 
an ee) 


Proof The proof is self-contained. However, your task will be much easier if you 
study first Theorems 19.2.4 and 19.2.10 which have nearly the same proof. 

The set |7|? is a subset of the ball of L'(j) of center 0 and radius B, where ju is 
the counting measure on {1,..., N}. The first step of the proof is to take advantage 
of this through Theorem 19.2.1. Consider the largest integer t for which 2’ < B/A. 
Since B > A, we have t > 0, and 2~* < 2A/B. Recalling that for a subset J of 
{1,..., NW} we have w(/) = card, homogeneity and Theorem 19.2.1 provide us 
with an admissible sequence of partitions (D,,) of |T|? and for each D € Dy, an 
integer €*(D) € Z, such that if for t € |T|?, we set!! 


L*(t,n) = €*(Dnlt)) (19.100) 
then (according to (19.49) and (19.48), respectively) 


: 2"B 
Wt e|T|?, card{i < N; 4 > 2-°@™) < 207 < ie (19.101) 


vrelr?, De es 218-27 R= TA, (19.102) 


n>0 


The second step of the proof is to bring back to T the information we just 
gathered about |7|?. This is done in the most straightforward manner. We consider 
the canonical map g : T > |T|? given by g((tj)i<n) = (|ti|?)i<n. We consider on 
T the admissible sequence of partitions (C,,) where C,, consists of the sets g7! (D) 
where D € D,,. Fort € T, we define ¢(t, n) = €* (g(t), n), and this number depends 
only on C,,(t) because €* (y(t), 2) depends only on D,,(g(t)). Moreover, we deduce 
from (19.101) and (19.102), respectively, that 


2B 
WteT, card{i < N; ||? > 2-66} < os (19.103) 


WreT, De aa =LA. (19.104) 


n>0 


The obvious move now is to combine this information with the information provided 
by (19.96). Denoting by A, the partition generated by 5, and Cy, the sequence (A,,) 


'l As ususal, D, (t) is the element of D,, containing f. 
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is increasing and card A, < Ny+1. Moreover since A,(t) C By,(t), (19.96) implies 


weeT, )\2"A(An(t), doo)? < A, (19.105) 


n>0 


and furthermore, the integer £(t, n) depends only on A, (ft). 

After these preparations, we start the main construction. For D € A, , n > 0, let 
us choose in an arbitrary manner v,(D) € D, and set 1, (t) = v,(An(t)). We write 
y(t) = (ni (t))i<n, and we define 


HO=F2N 5 ro OP a2"), (19.106) 
For n > 1, we further define 
Int) ={i< NN; |mas Ol? = 2°  O<k <n = |mei@)|? <2}. 
It is important that J,,(t) depends only on A,,(t) so that there are at most card A, < 


Nn+1 Sets of this type. Next, since |t; — ni (t)| < A(An(t), doo) < A(Bn(t), doo), 
we have limy-+o0 |tj — %n.i(t)| = 0 and thus 


ff<N; HlA0CU AO. (19.107) 


n>0 


Finally, we note from (19.103) that 


n 


card I, (t) < (19.108) 


The definition of J, (t) shows that forn > 1 andi € J,(t), we have |7n—1,i(t)|? < 
2—-&.2—D go that 


Iti] < |ti — Hn, (0) + ltn—1,1@)| < A(An-10), doo) + 270" D/P 
and hence 
ti|? < K(p)(A(An-1(2), doo)? +276") := cn). (19.109) 


Let us define c(t, 0) = A(T, dx)?. Since 0 € T, (19.109) remains true for n = 0. 
We have then 


n>0,i€h@) => |t|? <clt,n). (19.110) 
Moreover (19.105) and (19.104) imply 


VreT, Y > 2" c(t, n) <K(p)A. (19.111) 


n>0 
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We consider the family F of all pairs (7, (t),2~”) fort € T andn > 0, and we 
prove that 


|T|? Cc K(p)Aconv U Bo (I, a) . (19.112) 
(LayeF 


We recall the notation (19.67). For n > 0, we define uy, by 2”c(n, t)un = |t\?11, 01) 
so that, using (19.110), 


Un 


= —_|t|"1 BooUn(t),2~") . 19.113 
ren" In(t) © BooUn(t),2-") ( ) 


We then have, using (19.107) in the first equality, 


i? = SO eld = D0 2", Hun - (19.114) 


n>0 n>0 


Together with (19.111) and (19.113), the relation 19.114 proves that |f/? € 
K(p)A conv U(r a)eF Boo(I, a) and (19.112). (The reason why we can take a convex 
hull rather than the closure of a convex hull is that there is only a finite number of 
possibilities for the sets J, (t).) 

It remains now to deduce (19.99) from (19.112). This requires a careful enumer- 
ation of the pairs (J, a) € F for which we basically copy the argument given at the 
end of the proof of Theorem 19.2.4. Consider the set Z,, consisting of all the sets of 
the type [,(t) for t € T so that cardZ, < Nn+1. We find a sequence (J; )x>1 of sets 
such that J, = @ fork < N2 and that forn > 0,Z, = (Ik; Nei < k < Nx+2}. This 
is possible because card Z, < Nn+1 < Nn+2 — Nn+1- 

Then any (/, a) € F is such that for some n > 0, we have J € Z, anda = 27”. 
Thus, J = J, where Ny41 < k < Nyn+2 so thatk + 1 < Nyj+2 and consequently 
27” < 4/log(k + 1). Thus (19.99) follows from (19.112). Furthermore, since k > 
Nn+1, we have 2” < Llogk, and (19.108) implies (19.98). oO 


The smallness criterion provided by (19.99) is perfectly adapted to the control of 
EUs”. 
Theorem 19.3.3 Consider the set T = {(x*(ej))i<nix* € XT} of (19.88). 
Assume (19.96) and let B as in (19.97). Consider € > 0 and 6 < 1 such that 


A 


6 << ———.. (19.115) 
BeN* logN 


Then if the rv.s (6;)i<n are as in (19.86) and J = {i < N; 6; = 1}, for v => 6, we 
have 


A v 
P P>vK L = 
(ws >v 5-7) = exp ( -) 
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and in particular 


E||Us ||? < K(p) (19.116) 
€ log N 
Proof The magic is that 
sup So biti = sup Yo biti <a> 4, 
t€Boo(I,a) jay t€Boo(I,a) icq jel 
so that (19.99) implies 
1 
dit; < K(p)A ————— bj . 19.117 
sup) 3f) < K(p) UP eae 2 (19.117) 


Ps, é 
te|T| i<N iE€lk 


We will control the right-hand side using the union bound. For k > 1, we have 
card x < Lo log(k + 1)B/A by (19.98), so that 


Lo log(k + 1) 


6 card hk < 
eN£ log N 


We recall the inequality (11.70): If wu > 66 card I, 


P(dos = u) < exp (—F los 5, a) 


iel 


Considering v > 6, we use this inequality for u = Lovlog(k + 1)/(elogN) = 
66N* card I, > 66 card f;, to obtain 


p v8; = Lov log(k + 1) 2x ( 7 Lov log(k + 1) logi*)) 
€ log N 2€ log N 


iél, 


Lov log(k + 1) 
= exp ( - ee ‘ 


19.118 
5 ( ) 


Thus, if we define the event 


Lov log(k + 1) 


Qv) :Vk>1, 503 < neh 


iel, 
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we obtain from (19.118) that P(2(v)°) < Lexp(—v/L). When 2(v) occurs, for 
k > 1, we have 


1 L 
mp ay. 
log(k + 1) €logN 


iel, 
Then (19.117) and (19.89) imply ||U,||? < K (p)vA/(e log N). Oo 
We finally come to the control of T. We recall the functionals ya,g of (4.5). 
Theorem 19.3.4 Under the conditions of Theorem 19.3.1, the set T of (19.88) 
satisfies 
Yp.p(T, doo) < K (p,m)S(log NY"? . (19.119) 
Before the proof, we consider the (quasi) distance d.. on Xj defined by 
dyo(x", y") = max |x"(ai) — y"@a)] 
The map y : X} > T given by w(x*) = (x*(x;))i<w Satisfies 
doo (W(x"), W(y")) = daa", y") . (19.120) 
Lemma 19.3.5 We have 
ex(X}, dx) < K(p, n)S2-*/? (log N)!/? (19.121) 


or, equivalently, for € > 0, 


1° “oo? 


S\P 

log N(X}, dx, 6) < K(p, »(=) logN . (19.122) 
Here, Xj is the unit ball of X* for the original dual norm, N(X7, doo, €) is the 
smallest number of balls for dx. of radius « needed to cover X 13 and e; is defined 
in (2.36). 

It would be nice to have a simple proof of this statement. The only proof we know 
is somewhat indirect. It involves geometric ideas. First, one proves a “duality” result, 
namely, that if W denotes the convex hull of the points (+-x;);<y, to prove (19.122), 
it suffices to show that 

S\pP 
log N(W, || - ||,2) < K(, m(=) logN . (19.123) 


This duality result is proved in [24], Proposition 2, (ii). We do not reproduce the 
simple and very nice argument, which is not related to the ideas of this work. The 
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proof of (19.123) also involves geometrical ideas. Briefly, since X* is p-convex, it 
is classical that “X is of type p, with a type p constant depending only on p and 
n” as proved in [57], and then the conclusion follows from a beautiful probabilistic 
argument of Maurey, which is reproduced, for example, in [120], Lemma3.2.  O 


Exercise 19.3.6 Deduce (19.121) from (19.123) and Proposition 2, (ii) of [24]. 


Proof of Theorem 19.3.4. Recalling that we assume that the dual norm of X is p- 
convex, we combine (19.121) with Theorem 4.1.4 (used for a = p). oO 


Proof of Theorem 19.3.1. We reformulate (19.119) as follows: There exists an 
admissible sequence (B,) on X 7 for which 


Vx* XT, D> 2"A(Bn(x*), d&)? < K(p, n)S? logN = A. (19.124) 


n>0 


It follows from (19.124) and (19.120) that the set T of (19.88) satisfies (19.96). 
Thus, this is also the case of the smaller set Tc of (19.93). Since 0; 2x |ti]? < C 
for t € Tc, this set also satisfies (19.97) for B = max(A, C). We then conclude 
with Theorem 19.3.3. oO 


To conclude this section, we describe an example showing that Theorem 19.3.1 
is very close to being optimal in certain situations. Consider two integers r, m and 
set N = rm. We divide {1,..., N} into m disjoint subsets |, ..., J of cardinality 
r. Consider | < q < 2 and the canonical bases (e;);<n, (€j) j<m of a and €4,, 
respectively. Consider the operator U : ae — £1, = X such that U(e;) = e j 
where j is such thati ¢ J;. Thus, S = 1. It is classical [57] that X* = eP. is 
p-convex. Consider 5 with 6" = 1/m. Then 


. ‘ ]\m 1 
P@j sm; Wiel), =) =1-(1--) — 
: m L 
and when this event occurs, we have ||U,|| > r!/? since || Lier, e;|| = r!/4 and 
lr Quier; ei) || = r|le;|| = r. Thus, 


E||Us\|? = : (19.125) 
Let us now try to apply Theorem 19.3.1 to this situation so that x; = e; for 
i € I;. Then we must take C large enough such that || - ||c = || - ||, that is, C = 


SUP jx) <1 Di<w |X*(xi)|?. Since there are r values of i for which s; = ej, we get 


Se Gorar > wre’. 


i<N jxm 
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This can be as large as r for ||x*|| < 1, so one has to take C = r. Then B = r 
whenever K (q) log N < r. Let us choose € = 1/(2r) so that for large m 


1 SP 1 1 
< = = 


ic < ee 
m'/r ~ BeN€  reN€ — reméré 


Thus (19.125) shows that (19.95) gives the exact order of ||U, || in this case. 


19.3.2. The A(p)-Problem 


We denote by 1 the uniform measure on [0, 1]. Consider functions (x;);<y on [0, 1] 
satisfying the following two conditions: 


WSN, [xillo <1, (19.126) 
The sequence (x;);<y is orthogonal in v= Oy : (19.127) 
For a number p > 1, we denote by || - ||p the norm in L? (A). Thus, if p > 2 for all 


numbers (a@;);<v, we have 


| rail, =| ail, - 


i<N i<N 


J. Bourgain [22] proved the remarkable fact that we can find a set J, with card J > 
N2/P , for which we have an estimate in the reverse direction: is 


> «xi, < K(p)( Soa). (19.128) 


ieJ ieJ 


VQiies 


Bourgain’s argument is probabilistic, showing in fact that a random choice of J 
works with positive probability. The most interesting application of this theorem 
is the case of the trigonometric system, say x;(t) = cos 2k;t where the integers 
(kj )ie, are all different. Even in that case, no simpler proof is known. 

We consider r.v.s 6; as in (19.86) with 6 = N2/P—! and we set J = {i<N; 5, = 
1}. 


Theorem 19.3.7 There isarv. W > 0 with EW < K such that for any numbers 
(@i)ies, we have 


do aixil|, < w( doe?) (19.129) 


ieJ ieJ 


V@iies, 


12 Proving this is (a version of) what was known as the A, problem. 
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Here, as well as in the rest of this section, K denotes a number depending only on 
p, that need not be the same on each occurrence. Since P(card J > N2/P) > ix 
recalling that EW < K and using Markov’s inequality, with positive probability, we 
have both card J > N2/P and W < K, and in this case, we obtain (19.128). It 
is possible with minimum additional effort to prove a slightly stronger statement 
than (19.129), where the L? norm on the left is replaced by a stronger norm.!* 
We refer the reader to [132] for this. In the present presentation, we have chosen 
the simplest possible result. I am grateful to Donggeun Ryou for a significant 
simplification of the argument.!> 

Theorem 19.3.7 is mostly a consequence of the following special case of 
Theorem 19.3.1, which we state again to avoid confusion of notation. We recall 
the Definition 4.1.2 of a 2-convex Banach space and the corresponding constant 7. 
Given a number C > 0, we denote by || - ||c the norm on X such that the unit ball 
of the dual norm is the set 


Xho =e XS tlle s = [xt e Xt; It sl, Dota? sch. 
i<N 
(19.130) 


Theorem 19.3.8 Consider a Banach space X such that X* is 2-convex with 
corresponding constant n. Then there exists a number K (n) depending only on n 
with the following property. Consider elements x\,...,xn of X with ||x;|| < 1. 
Denote by U the operator Cc — X such that U(e;) = x;. Consider a number 
C > O and define B = max(C, K (n) log N). Consider a number 5 > 0 and assume 
that for some € > 0 


é6< <1 
~ BeN*e — 


(19.131) 


Consider r.v.s (6;)j<n as in (19.86) and J = {i < N; 6; = 1}. Then the restriction 
Uj of U to the span of the vectors (e;)iey satisfies 


E||Us\I < “o 


(19.132) 


where ||U;||c is the operator norm of Uy when X is provided with the norm || - ||c. 


Despite the fact that Bourgain’s result is tight, there is some room in the proof 
of Theorem 19.3.7. Some of the choices we make are simply convenient and by no 
means canonical. We fix p > 2 once and for all, and we set py = 3p/2. Denoting 


'5 Note that this is obvious for large N by the Central Limit Theorem. 
'4 More specifically, the so-called L?’! norm. 


'5 And in particular for observing that there is no need of a special argument to control the large 
values of the function f in the proof below. 
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qi the conjugate exponent of pj, the dual L7' of X = L?! is 2-convex [57], and the 
corresponding constant 7 depends on p only.!° For a number C > 0, we consider 
on X = L?! the norm || - ||c constructed before the statement of Theorem 19.3.8. 
From now on, 6 = N7/P-!, 


Lemma 19.3.9 Setting C) = N'/?~'/? and C2 = e~'|N'~/? log N, we have 
EllUsllc, < K 5 EllUsiic, = KVlogN . 


Proof We apply Theorem 19.3.8 to the case X = L?!, whose dual L%! is 2-convex, 
and we note that it suffices to prove the result for N large enough. 

We choose first C = C; (= N!/?~!/P) and € = 1/2 — 1/p. For N large enough 
B = max(C, K logN) = N'/2-!/P so that BN€ = N!-?/P = 8—!. Sincee < 
1, (19.131) holds. Since 1/e < K, (19.132) then proves that Elsie, < K. 

Next, we choose C = C2 (= e~!N!~?/P log N) and € = 1/log N. Thus, Né = 
e, and for N large enough B = max(C, K log N) = C, so that Be = e~!N!~?/P 
and BeN€ = N!~?/P = §~!, Thus (19.131) holds again, and (19.132) proves now 
that E|U||Z, < K log N. o 


Lemma 19.3.10 Consider a measurable function f. Assume that ||f \|\c, < 1 and 


II fllcy < Vlog N. Then || f lp < K. 
Proof of Theorem 19.3.7. Let 


V = [lUslle, + |Usllcp 


1 
Jlog N 
so that EV < K by Lemma 19.3.9. Consider numbers (a@;)jey and y := 
Mies Hei For j = 1,2, we have ||Us(y)llc; < Uullc;llyll2. The function f = 
(Vilylla)7'Uy (y) satisfies the hypotheses of Lemma 19.3.10 so that || f ||» < K. 
That is, ||Uy(y)Ilp < KV lyll2, thatis, || Vicy @Xillp SKVQiey a)". 0 


Before we prove Lemma 19.3.10, we need to learn to use information on || f||c;- 
This is through duality in the form of the following lemma, which is a consequence 
of the Hahn-Banach theorem: 


Lemma 19.3.11 Jf f € L?! satisfies || f\lc < 1, then f € C := conv(C; UC) 


where Cy = {g; llgllp, < 1} and Co = {icy Bixis Dic BP < C7}. 


Proof The set C2 is closed and finite dimensional, so it is compact. The set C is 
closed so that the set C is closed and obviously convex. If f ¢ C, by the Hahn- 
Banach theorem, there exists x* € L%! such that x*(f) > 1 but x*(g) < 1 for 
g €C. That x*(g) < 1 for g € Cj implies that ||x*||,, < 1. That x*(g) < 1 for 


'6 So that when we will apply Theorem 19.3.8 to X, the corresponding constant K (yj) depends 
only on p. 
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€ C2 implies that Dien x" Ga)? < C. Thus, ||x*||c < 1 by definition of the norm 
|x*|lc. Since x*(f) > 1 from (19.1), we have || f lc > 1. oO 


Proof of Lemma 19.3.10. The proof is based on the formula 


ifiz= a PAF = pdr , (19.133) 


where A is Lebesgue’s measure on [0, 1] and suitable bounds for the integrand. The 
method to bound A({| | = t}) differs depending on the value of t. For a certain 
quantity D (depending on NV), we will distinguish three cases. For t < 1, we will 
use that A({| f| => t}) < 1 so that i PPAF | > t})dt < K. For! <t < D,we 
will use the hypothesis that || f\|[c, < 1. Fort > D, we will use the hypothesis that 


IIfllc, < Vlog N. 
Since || f llc, < 1, by Lemma 19.3.11 (used for C = C; = N!/2-!/P), we 


may write f as a convex combination!” f = t1u1 + T2U2 where ||u1||p, < 1 and 
u2 = icy Bixi with °;<y B? < N'/?~'/?. By (19.126) and (19.127), we have 


I|w2II5 < N!/P—1/2, Markov’s inequality implies that for each s > 1 
Ills 
A({lu| = th) <—— , (19.134) 


1 
and we combine (19.134) with the obvious inequality 
ACF] 2 th) SAC Iwal 2 t}) + AC [wal 2 t}) (19.135) 
to obtain 
ACIFL So) <P ee Nel? (19.136) 


Recalling that p > 2, let us define a > 0 by the relation a(p; — 2) = 1/2 — 
1/p, and let us set D = N®. Then for t < D, we have t7?! + t-?7N1/P-1/2 — 
t-P1 4 ¢-2N-@P1-2) < 2¢-P!, Since py = 3p/2, we get fy? t?—!AC{| f| = t})dt < 
2 fpor7!-P/dt < K. It remains only to control [> t?~!A({| f] = t})de. 

Since ||f llc) < log N, by Lemma 19.3.11 again (used now for C = C2 = 
e—!n!-2/P log N), we may write f as aconvex combination f = tT; v;+72v2 where 
lluillp, < Vlog N and v2 = )oj<y Bixi, with );-y B? < logNCy! = en?7/?-1, 
Thus, Ilva lS < eN?/P—!, and since ||x;\loo < 1, we also have ||v2|loo < Dien Bi = 


VN Dien BP < 2N'/?. Just as in (19.136), we then obtain 


ACF = t) < AC lvl = t) + Ava] = th) < dog N)?!/22-P! + A({|va] > th), 


'7 That is, t1, 7 > 0, tT; +t = 1. 
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so that fp t?-!ACI fF] = th)dt < (log N)?!/?21; + Ip where I) = fp t?~P!—ldt 
and In = fi t?—!A({|v2| = t})dt. Now I) < KD?~?! = KD-P/? = KN? 
Since ||v2||3 < eN7/?—!, we have A({|v2| => t}) < eN?/?-!4-? by (19.134), and 
also A({|v2| > t}) = 0 fort > 2N!/P > |Iv2|Io0, so that 


2N!/P 2N!/P 
h< / tP—!A({|v2| > t})dt < enti | tP—3dt = K , 
0 0 


and we have proved as desired that iy Af) = Aide = K. Oo 


19.4 Sidon Sets 


Let us recall that if T is a compact abelian group, a character x is a continuous map 
from T to C with |x (t)| = 1 and x(s+t) = x(s)x(f). Thus, x (0) = 1 and x(—s) = 
x(s). Throughout this section, we denote by yw the Haar probability measure on 
T. We recall from Lemma 7.3.6 that two different characters are orthogonal in 
L?(T, dj). Given a set I” of characters, we define its Sidon constant I; (possibly 
infinite) as the smallest constant such that for each sequence of complex numbers 
(dy )yer, only finitely many of them nonzero, we have 


> lay| < Fasup| S> a,x]. (19.137) 
t 


xel xel 


We say that I” is a Sidon set if I4; < oo. To understand this definition and the 
next one, it is very instructive to consider the case where T = {—1, 1}4 and I” = 
{e;, 1 < N} where e;(t) = t; ./8 Sidon sets are standard fare in harmonic analysis. In 
a certain sense, the characters in a Sidon set are “independent”. !° Another measure 
of independence of the elements of I” is given by the smallest constant Ij, such that 
for all p > 1, we have, for all families, a, as above 


(| | > ayx\?du) = VPle( > lax?) ‘ (19.138) 


xel xel 


This should be compared with Khinchin’s inequality (6.3), a comparison which 
supports the idea that controlling Ig is indeed a measure of independence. The 
subscript “sg” stands for “subgaussian”, as in the subgaussian inequality (6.1.1). 
This is because, as is shown in Exercise 2.3.8 (which we advise the reader to 


'8 Here of course, the group structure on {—1, 1} is given by ordinary multiplication. 
'9 Tn a much stronger sense than just linear independence. 
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carefully review now), the constant Ig is within a universal constant2” of the 
constant I -" defined as the smallest number such that 


| Soxxly, se( lax?) . (19.139) 


xel xel 


where the norm || - ||y, defined in (14.14) is for the measure jz. The reader should 
also review (14.18) which explains how (19.139) is related to the subgaussian 
inequality (6.1.1). 

One of the main results of this section is the following classical result. It relates 
the two “measures of independence” which we just considered. 


Theorem 19.4.1 (W. Rudin, G. Pisier) We have 
Tyg < LI. (19.140) 


There exists a function g : R+ — R* such that?! 


si < (Ng) - (19.141) 


Furthermore, we will prove a considerable generalization of (19.141), due also to 
Gilles Pisier [89] (after contributions by J. Bourgain and M. Lewko [20]). The most 
important consequence of this theorem is that Ig < 00 if and only if I; < oo. 

As this is a book of probability rather than analysis, we will simplify some 
analytical details by assuming that T is finite. We start with the rather easy part, 
the proof of (19.140). 


Lemma 19.4.2. Consider complex numbers By for x € I. Then we can find a 
function f on T with 


fll < Psi sup |By| (19.142) 
xel 
and 
Vx el; [ frou = By. (19.143) 


Here, the norm || f ||; is the norm in L! (dy). 
0 That is, Ing < LIy, < LIhg. 


2! The proof we give shows that [yj < L(+ Pp)". It is known that yy < Ld + log Ig) but 
that it is not true that Ij < LIyg; see [88]. 
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Proof Let us denote by V the complex linear span of the functions xy for x € 
I’. Recalling that the characters are linearly independent because they form an 
orthogonal set, consider the linear functional ¢ on V given by (> yer UxX) = 
Ve xer %x By. Thus, ifh =D) <p &y x then, using (19.137) 


Ip(h)| < sup |B, | 2 lay| < Ts sup |By|IlAlloo - (19.144) 
xe yer xe 


Let us provide the space of functions on T and its subspaces with the supremum 
norm. The content of (19.144) is that @ is of norm < [yj SUPyer |B,| on V. By the 
Hahn-Banach theorem, we can extend ¢ to a linear functional @ of the same norm on 
the space of all functions on T. Since T is finite, there exists a function f on T such 
that @(h) = f fhdy for all functions h on T, and || fli = |IGll < Tsi sup, er |By|- 
In particular, { fhdw = $(h) whenever h € V. Taking h = x implies (19.143). oO 


Lemma 19.4.3 Consider for x € I’ numbers ¢y = +1. Then for each p and each 
sequence of numbers (ay) yer, we have 


/ | > ay x|?du < rg f | byayx|?dy . (19.145) 
xel xe 


Proof According to Lemma 19.4.2, we can find a function f on T such that 
J fxdu = €&, for each x € I’, whereas ||f ||) < Jj. Let us define g(t) = 
Eyer &yQ, x(t). Thus, for t € T, we have 


e ay X(t) = > Ay Ey X (LE 


xel xel 


= YL aex0 f feoxcaucn 


xel 


= Daze, | fooxte + ndute) 


xel 


= [ fee + t)du(x) . (19.146) 
If a function h satisfies sf |h(x)|du(x) = 1, we have 
| [rere + duce)? = ( f Iheoliecs + dlduen)” 


< f meniigee + rac 
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where in the second line, we use the convexity of the map x > |x|”. Using this for 
the function h(x) = f(x)/|| fll1, we obtain 


| Fong + pane? < eit f If Ilex + O/Pdu(x) , 


and combining with (19.146), we obtain 


| So ayx@]? < IF" f ifeoligee + oid 


xel 


Integrating in ¢ and using the translation invariance of jz yields 


/ | So oy xO|?duy < FR / IF @dllgox + O/Pdudu(e) 


xel 
= iit f Ig(t)|?du(t) . Oo 
Proof of (19.140). We average (19.145) over all choices of ey = +1, using 
Khintchin’s inequality (6.3). oO 


Next, we turn to the proof of (19.141). We will deduce it from the considerably 
more general recent result of [89]. 


Theorem 19.4.4 For j = 1,2, consider a sequence (Qj,n)n<n of functions on a 
probability space (§2, dv). Assume that each of these sequences is an orthonormal 
system.?? Assume that for certain numbers A, B, for j = 1,2, and each numbers 
(Qn)n<N, we have 


Wn <N, |lPjnlloo SA, (19.147) 


1/2 
| do enginly, SBC len?) (19.148) 


n<N n<N 


Then for each complex numbers (otn)n<n, we have 


Yo lon| < LA7(A + BY | > ang in ® G20], » (19.149) 
n<N n<N 


where (1,.n ® 2, is the function on 82 x 2 defined by G1.n ® $¢2,n(@1,@2) = 
P1,n(@1)G2,n(@2). 


2? That is, each vector is of norm 1 and they are orthogonal. 


648 19 Applications to Banach Space Theory 


Proof of (19.141). Consider a set 7 = (Xn)n<n of characters. Let G11 = $2.n = 
Xn SO that (19.147) holds for A = 1 and (19.148) holds for B = Te < LI. Taking 
(2,v) = (T, 1), since Xn(s)Xn(t) = Xn(s + 1), (19.149) becomes D°,<y lanl < 
L(+ Iag)4 Il Dey @nXnlloo, and this proves that yi < L(1 + Myg)*. Oo 


Throughout the rest of this section, we write T = {—1,1}" and pw denotes 
the Haar measure. We denote by ¢, the n-th coordinate function on T. The first 
ingredient of the proof of Theorem 19.4.4 is the following: 


Proposition 19.4.5 Under the conditions of Theorem 19.4.4, there exist operators 
U;: L'(T, w) > L'(&, v) such that Uj (en) = jn and ||Uj\| < L(A + B). 


To understand the issue there, let us first prove a simple fact about operators from 
ex £l(N) to L1(Q, v). Given elements u1,...,Um of L1(Q, v), we define 
maxy<m |u| pointwise, (maxy< |UK|)(@) = Maxz<m |Ux(@)|. Similarly, elements 
of ¢! are seen as functions on N. 


Lemma 19.4.6 Consider a bounded operator U : t! + L'(Q,v). Then given 
elements f\,..., fim off, we have 


|| max |Ufed| fy < IU Il] max | fill - (19.150) 


Proof Let (e,) be the canonical basis of ¢! and hy = U(en) so that [An ||, < ||U I]. 
Consider elements f1,..., fin of ¢! with k= a) dk.nén. Then 


max |U(f)| = max |) > ag nltn| < }) max |ax.n||/n| 
k<m k<m k<m 
n>1 n>1 


and thus 
|| max |U(fe)I fy < So max lagnlllAnlly < UI D> max lagna - 
k<m k<m k<m 
n>1 n>1 
Finally, a maxi<m |@k,n| = || Maxr<m | fellli- oO 


Mireille Lévy proved the following converse to Lemma 19.4.6. The proof does 
not use probabilistic ideas but relies on the Hahn-Banach theorem, and we refer the 
reader to [56] for it. 


Lemma 19.4.7 Consider a subspace E of ¢' and an operator V from E to 
L'(92, v). Assume that for a certain number C and any elements f\,..., fm of 
E, we have 


|| max |V (fall, < Cl] max | fill - (19.151) 
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Then there exists an operator U : é! — L!(Q, v) such that \|U|| < C and that the 
restriction of U to E coincides with V. 


Proof of Proposition 19.4.5. Since T is finite for each point t € JT, we have 
w({t}) = 1/cardT. Thus, the space L!(T, 2) is isomorphic to a space €'. We 
consider the span E of the elements ¢, in LT, ), and fixing 7 € {1,2} on E, we 
consider the operator V defined by V (€,) = @j,n. The plan is to prove (19.151) for 
C = L(A + B) and to use Lemma 19.4.7. 

Let fr = Donen %K,n€n- Thinking of (T, j2) as a probability space, and denoting 
accordingly by E integration with respect to jz, we then have 


ee a| max | fil 7 E max | Dd aknen| - 
n<N 


We recognize the supremum of a Bernoulli process. Setting ap, = 0, forO < k < 
m, consider the sequence ax = (dk.n)n<n- We can then use Theorem 6.2.8 to find a 
set W Cc £7(N) with y2(W) < LS such that each sequence a; can be decomposed 
as a, + ae where a, € W and where Me AEN lag. nl < LS. Since ag = 0, we have 
a4 + de, = 0. We may then replace a} by a) = ay and ae by a. = ar. This replaces 
W by W - a} so that now 0 € W and consequently |la|l2 < LS fora € W. Since 
V(En) = Gj,n, we then have 


max |V (fi) = max |v 2, arnen)| 2a, (19.152) 
n< 


where 


_ De ap 2 oy; 
T= max | > A nPjsn| > T= i | = AK nPjn| 
~ n<N 


k<m 
n<N 
We will prove that 
Wa < LBS; |i < LAS. (19.153) 


Combining with (19.152), this proves as desired that || maxz<m |V(fx)||l]1 < L(A+ 
B)|| supy <j | fx ||| and concludes the proof. 

Since ||@j,nlloo < A by (19.147) and since ),,<y lag ,| < LS, it follows that 
Ilo. < LAS so that ||I||;} < LAS. To control the term I, let us consider the 
process Y, := )° dn@j,n so that ||I|]) < Esup,cw |Yal|. Then (19.148) means that 
the process X, = Y,/(LB) satisfies the increment condition (2.4). Using (2.60) 
together with the fact that E/Y,| < LS because ||a||2 < LS fora € W, this implies 
that ||I||] < ZBS and finishes the proof of (19.153). oO 
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The next ingredient to Theorem 19.4.4 is a nearly magical observation. Recalling 
that T = {-l, Nae forn < N and j = 1,2, we define on T x T the functions ¢/ 
by ed (t!, 17) = 4) where t/ = (nen eT. 

Proposition 19.4.8 Given 0 < 6 < 1, we can decompose the function F = 
iy ele? as F = F, + F) ina way that 


Fi = ff Fuld @ dy = 2/8 (19.154) 
and such that for any two functions g, and g2 on T, we have 


iff Frei @ godu @ du < dllgillaligall . (19.155) 


Given a set J C {1,..., N}, we define the function ¢; on T by eg = | andif / #4 @ 
by e7(t) = Mhne ttn. As I varies, the functions €; form an orthonormal basis of 
LT, dj). For j = 1,2 andaset J C {1,..., N}, we define el = Des ej) so that 
é] WEF =E ee As I and J vary, the functions eye5 form an orthonormal basis of 
L?(T x T, du @ dy). 


Lemma 19.4.9 The function t := ret N} glerat! satisfies ||t ||, = 1. 


Proof Indeed, t = [,<yQ+ dele) is > 0 and of integral 1. Oo 


Proof of Proposition 19.4.8. Define Fy = (t — 1)/8 = Yoearap>1 €]€75" 
so that ||Fi ll) < 2/6 by the previous lemma. Then Fp := F —- Fy = 
=) aise ame 52 For j = 1,2, let us decompose the function g; on T 


in the basis (€7): gj = )°, gj,1é1. Then, g1 @ g2 = aig 81,182, €€5 and thus, 
using the Cauchy-Schwarz inequality, 


iff Fog ® godu@dul=| D> sl gi pei) <8 Yo Igirgoal 
card [/>2 card [>2 


1/2 1/2 
<8( Clg’) (Soler?) =dligilleligal -0 
I I 


Lemma 19.4.10 Under the hypotheses of Theorem 19.4.4, consider numbers 
(On)n<Nn with |On| = 1. Then given any 6 > 0, the function ® := Y°,<y OnGi,n ® 
¢2,.n can be written as Bj + D2 where ||®i|\) < L(A + B)?/65 and where for any 
Junctions g), g2 on 82, 


| | f e281 @ gadv @ dv| < L(A + BY He ollgallo (19.156) 
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Proof We may assume that 6, = 1 by replacing ¢1,n by O:¢1,n. Consider the 
operators U; as provided by Proposition 19.4.5. There exists an operator U; @ U2 : 
L'(T xT, 4@p) > L'(Q x 2, v@v) with the property that Uj ® U2(f; ® fr) = 
Ui (fi) ® U2(f2). This is obvious by considering first the functions f1, f2 which 
are supported by a single point. This construction also shows that it has a norm 
< Ui ||| U2 ||. Furthermore, 6 = YD, <y Pin ® $2,n = U1 @ U2 ncn En @ En)- 

Considering the functions F; and F> of ae 19.4.8, we set @) = 
U; @ U2(F\) and 2 = U; ® U2(F2) so that ® = &; + @. Now, ||P |l1 < 
LU |||Ual| Fill: < L(A + B)?/6. It remains eri? to prove (19.156). Let UF: : 
L®(Q,v) > L®(T, 2) denote the adjoint of U;. Givena function H € L! (TxT), 
the identity 


[[ueucneended = / HU¥*(g1) @ U3(g2)du@dw (19.157) 


holds because it holds when H is of the type Hi ® Hp and because the elements of 
that type span L!(T x T). Also, ||U*(gj)ll2 < IUF(glloo < L(A + B)llgjlloo- 
Using (19.157) for H = Fp and (19. 155) yields (19. 156). oO 


Proof of Theorem 19.4.4. Let us fix the numbers a, and consider 6, with |@,| = 1 
and @,6, = |a,|. Since the systems (¢1,,) and (¢2,,) are orthonormal, we have 


» iei= // Wbdu@ du, (19.158) 


n<N 


where VW = en<Nn OnP ln ® ¢2 and @ = a, OnPi.n ® $2,n. Let us then use 
the decomposition ® = @; + ®) provided by Lemma 19.4.10. First, 


iff PO du @ dy] < |r| ¥ loo S LOMA + BY IV leo - 


Also, using (19.156), we have 


| [{ Foran au < > wxnll [f Pa¢1.n ® e20 < L8A*(A+ BY >> lanl . 


n<N n<N 


Then (19.158) yields 


Yo lanl < L87'(A + B)?||W loo + LSA*(A + BY D> lord, 
n<N n<N 


from which (19.149) follows by taking 6 = 1/(2LA(A + B)?). oO 


Appendix A 
Discrepancy for Convex Sets 


A.1 Introduction 


The purpose of this appendix is to bring forward the following (equally beautiful) 
close cousin of the Leighton-Shor theorem. We denote by A3 the usual volume 
measure and by (X;);<y independent uniformly distributed points in [0, 1]°. 


Theorem A.1.1 Consider the class C of convex sets in R*. Then 


~VN(log N)*/* < Esup | }°(c(Xi) — 43(C))| < LVN (log N)*/*. (A.1) 


The upper bound is proved in [113]. In this appendix, we sketch how to adapt 
the lower bound machinery of Chap. 4 to the present case. The following exercise 
highlights the parallels between Theorem A.1.1 and the Leighton-Shor theorem: 


Exercise A.1.2 Convince yourself that as a consequence of the Leighton-Shor 
theorem (A.1) holds when (X;)i<y are uniformly distributed points in [0, 1]? andc 
is the class of sets which are the interior of a closed curve of length < 1. 


Consider independent uniformly distributed points (X;);<n in [0, 1]*, C a class 
of subsets of [0, 1]‘ and define Sy := Esupcec | jen Ac(Xi) — Ax(C))|. This 
quantity is of interest when there is some constraint on the size of C. Interestingly, 
the constraint in dimension k = 3 that the elements of C are convex or in dimension 
k = 2 that they are the interiors of curves of length < 1 yield the same rate of 
growth for Sy. 
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A.2 Elements of Proof of the Upper Bound 


Unfortunately, a complete proof of the upper bound in (A.1) requires considerable 
work, not all of which is very exciting, so we will give only a very short outline 
of it. The understanding of subgraphs! of convex functions is closely related to 
the understanding of convex sets. In fact, it is rather elementary to show that the 
boundary of a convex set in R? can be broken into six pieces which are graphs of 
convex 1-Lipschitz functions (defined on a subset of R?.) 

It is time to recall that a twice differentiable function g on [0, 17 is convex if and 
only if at each point we have 07g/dx? > 0 and 


Zo 2. (A.2) 


We denote by A2 the two-dimensional Lebesgue measure on [0, 1]?. 


Lemma A.2.1 A convex differentiable 1-Lipschitz function g on [0, 1]? satisfies 


a2 2 a2 
[en s2: [ Foes? [lB lne sz. (A.3) 
ax2 dy? axdy 


Proof We write i 0*g/dx7(x, y)dx = dg/dx(1, y) — 8g/dx(0, y) < 2, and we 
integrate over y € [0, 1] to obtain the first part of (A.3). The second part is similar, 
and the third follows from the first two parts using (A.2) and the Cauchy-Schwarz 
inequality. oO 


An important step in the proof of Theorem A.1.1 is as follows: 


Theorem A.2.2 The class C for functions [0, 1]? — [0,1] which satisfy (A.3) is 
such that y1,.2(C) < o. 


To understand why this theorem can be true, consider first the smaller class C* 
consisting of functions which are zero on the boundary of [0, 1]* and which satisfy 
the stronger conditions ||d*g/dx?||2 < L and ||d2g/dy?||2 < L. Then the use of 


Fourier transform shows that C* is isometric to a subset of the ellipsoid }7,, ,,(1 + 
nt + m*)|Xnm|" < L, and Corollary 4.1.7 proves that y1,2(C*) < oo. Surely, the 
assumption that the function is zero on the boundary of [0, 1]? brings only lower- 
order effects. The fact that for a function in C we require only integrability of 
the partial derivatives as is (A.3) (rather than square integrability for a function 
of C*) is a far more serious problem. It is solved in [113] following the same 
approach as in Sect. 17.3: One shows that a function f € C can be written as a sum 
reo Se Where fy € Cx, the class of functions g which satisfy | ¢/9x7 loo < 24, 


|97g/3Y"\loo < 2* and moreover A2({g 4 0}) < L2-*. One main step of the 


' Look at (4.118) if you forgot what is the subgraph of a function. 
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proof is to show that the sequence (y1,2(Cx)) decreases geometrically (in fact, 
V1,2(Ck) < Lk2-*/?), 


A.3 The Lower Bound 


Our strategy to construct convex functions is based on the following elementary 
lemma: 


Lemma A.3.1 Consider a function h : [0, 1]? > R and assume the following: 


ah ah 
h(0, 0) = 0 = —(0,0) = — (0,0), (A.4) 
ox dy 
37h 1 1 07h 1 
sal. <a¢: lec. <3 lest (AS) 
16 16° ldxdylloo~ 16 


Then the function g given by 


1 
sa. y=sz+5 50? + y”) + h(x, y) (A.6) 


is valued in [0, 1], and is convex. 


Proof It is elementary to prove that |h(x, y)| < 1/4 so that g is valued in [0, 1]. 
Moreover for each x, y € [0, 1], we have 


a8 yoyo a7 
ax2 7“ AT 9x2 = 8B 
and similarly 07g/dy?(x, y) > 1/8, |87g/dxdy(x, y)| < 1/16 so that (A.2) is 
satisfied and thus g is convex. oO 


Thus, to construct large families of convex functions, it will suffice to construct 
large families of functions satisfying (A.4) and (A.5). We will do this using a 
variation on the method of Sects.4.6 and 4.8. The control of the mixed partial 
derivatives requires a more clever choice of our basic function, to which we turn 
now. 

Let us consider the function f on R given by f(x) = O unless O < x < 1, 
f(@) = f’(0) =0, f" (x) € {-1, 1}, f’(x) = 1if 1/8 < x < 3/8,4/8 < x < 5/8 
or 7/8 <x < land f”(x) = —1 otherwise (Fig. A.1). 

We observe that f”(1—x) = —f"(x), f/U—x) = f’(x), and f(l—x) = — f (x) 


and also that 
[ras fra rarso. (A.7) 
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Fig. A.1 The graph of f’ 


For g = land! < ¢ < 2%, let us define f,,¢ by fg,e(x) = 2 (Qn = = 
1)2~7)). We note right away that 


fpf FOS Cale t <2c2eO?. (A.8) 


so that at a given q, the functions f,,¢ have disjoint supports. Let us list some 
elementary properties of these functions. We denote by A the Lebesgue measure on 
[0, 1]. The following lemma resembles Lemma 4.6.2, but the nontrivial new piece 
of information is (A.17). 


Lemma A.3.2 We have the following: 


IIfgelloo = 1. (A.9) 

fgells =2°-%. (A.10) 

lfvllee=2¢ > (A.11) 

le eset "73s (A.12) 

lGale22-7 (A.13) 

lieele ee. (A.14) 

Ifqelli = 27-4/L. (A.15) 

fF rqt3tsMeew > f fi fp pdr =0. (A.16) 
df eqt3tem dew = f fia fyydr=0. (A.17) 


Proof Only (A.16) and (A.17) are not obvious. To prove (A.16), we observe that 
on the support of Sy aa the function i. ¢ 1s constant, so the result follows from the 


obvious fact that a f"dd = 0. To prove (A.17), we observe that on the support of 
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Sy g the function iG , is affine, of the type a + bx. Since [ f’da = 0, it suffices 


to check that f f'(x)(& — 1/2)dA(x) = 0 which follows by making the change of 
variables x — 1 — x and since f’(x) = f’(1— x). Oo 


We consider a number r with r ~ log N/100. Given two functions f, g on [0, 1], 
we write a usual f ® g the function on [0, 1]* given by f @ g(x, y) = f(x)g(y). 
We consider an integer c > 3 which is designed to give us some room (just as the 
integer c of Sect. 4.8). It is a universal constant which will be determined later. We 
will be interested in functions of the type” 


92ck —3 


fk = 


> ZK0,0 Sekt ® feke > (A.18) 


£,0/<2¢k 


where Zz ¢.¢7 € {0, 1}. 


Lemma A.3.3. Given functions fx as above for ck <r and setting f = ee Sh 
we have 


a? a? 
[Sel <2": [rb <2 (A19) 
a2 f 72 
| as |; ~ ia 


Proof According to (A.16), and since at a given k the functions an , have disjoint 
supports, the functions f”, , form an orthogonal system. Thus, given any y, we have 


Los »~ be fod fae) a 


cksr €,0'<2¢k 
‘ gack , ; gack ‘ 
" 
= Zee lek, ella Sek.e'(Y)" S 33> — fox) 
cksr £,€'<2¢k cksr e'<20k 
where we have used that (derczek zx,0,0 fer.v(y))* = Dee <rck 2h oor feck, 


because the functions (fcx,¢/())¢<gck have disjoint supports, that a3 a <i, 


and that fe, ell5 = 2-“, Integrating in y and using (A.14) proves the first part 
of (A.19), and the second part is identical. The proof of (A.20) is similar, using 
now (A.17) and (A.12). oO 


To prove the lower bound, we construct numbers z, ¢ ¢ € {0, 1} by induction over 
q, forcq < r. Defining f, by (A.18), our goal is that the function h = ere ta 


? The coefficient 2~> is just there to ensure there is plenty of room. 
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satisfies (A.4) and (A.5) while for the corresponding (convex!) function (A.6), there 
is an excess of points under the graph of this function. After we construct f,, we 
define a dangerous square as a c(g + 1)-square which contains a point where one of 
the second-order partial derivatives of f has an absolute value > 1/32. Using a bit of 
technical work in the spirit of Lemma 4.6.7 (which does not use any tight estimate), 
it automatically follows from Lemma A.3.3 that at most, 1/2 of the c(g + 1)-squares 
are dangerous. It is also crucial to observe that the family of dangerous squares is 
entirely determined by f,. We ensure that the functionhy = )°, 23 Jy satisfies (A.4) 
and (A.5) as follows: When choosing Z,+1,¢,¢’, we take this quantity to be 0 if the 
corresponding c(q + 1)-square is dangerous. If it is not, we choose Zg+1,¢,¢ = 1 if 
and only if by doing so we increase the number of points X; in the subgraph of the 
function gg := 1/2+ (Ge y?)/8 + hg, and otherwise, we choose Z4+1,¢,¢7 = 0. Let 
us estimate this increase. Denoting by S(g) the subgraph of a function g, consider 
the regions Ay = S(gq +h) \ S(gq) and A~ = S(gq) \ S(gq +) where h = 
Afe(gti), ® fecgti),e fora = 27°4t)-9/,/r, Let Nz = card{i < N; X; € Ax}. 
Thus, by choosing Z,+1,¢,¢ = 1, we increase the number of points in the subgraph 


by Ni — N_. Now since h is of average zero, we have 43(A+) = A3(A_), and 
this volume is > V := 2~4¢4+) /(L./r). We can then expect that with probability 
> 1/4, we will have Ny —N_ > /NV/L > JN27*9+) /(Lr!/4), The key point 
is to show that this will happen for at least a fixed proportion of the possible choices 
of € and ¢’ because if this is the case at each step of the construction, we increase the 
number of points X; in the subgraph of gg by at least JN /(Lr'/4), and in r/c steps, 
we reach the required excess number r3/+./N/L of points X; in this subgraph. 

Let us detail the crucial step, showing that with high probability, a fixed 
proportion of the possible choices works. Let us say that a c(g + 1)-square is 
safe if it is not dangerous and is favorable if (with the notation above) we have 
Ny —N_ > JVNV/L > JVN2-7°4+) /(Lr!/4), Our goal is to show that with 
probability close to 1, a proportion at least 1/16 of the c(g + 1)-squares are both 
safe and favorable.* The argument for doing this is a refinement of the arguments 
given at the end of Sect.4.8. For each k < q, there are 22ck numbers Zk,t,0/ to 


2ck : * . : . 
choose, for a number 2? of possible choices. As k < q varies, this gives a 


total number of at most 22*<« — choices for the function f, and therefore for 
the family of safe c(q + 1) squares. Using poissonization, given any family of 
c(q + 1)-squares of cardinality M, the probability that less than M/8 are favorable 
is < exp(—6M), where £6 is a universal constant (see (4.98)). The family of 
safe c(q + 1)-squares has cardinality M > 27+<—! so that for this family, 
this probability is < exp(—BM) < exp(—B2?4+)«—!), The probability that this 
happens for at least one of the at most 2" possible families of safe c(g + 1)- 
squares is then at most 224 (L= p21) Choosing the constant c such that 627°! 
is large enough, it is almost certain that for each of the possible families of safe 
c(q+1)-squares, at least a proportion of 1/8 of its c(g+ 1)-squares will be favorable. 


3 One difficulty being that the previous steps of the construction as well as the set of dangerous 
c(q + 1)-squares depend on the X;. 


Appendix B 
Some Deterministic Arguments 


B.1_ Hall’s Matching Theorem 


Proof of Proposition 4.3.2. Let us denote by a the quantity sup )°;-y(wi + w;), 
where the supremum is taken over the families (w,)i<y , (w;) i<n Which sat- 
isfy (4.28), that is, w; + wi < cj for alli, 7 < N. For families (wj)j<n , (wi )i<n 
satisfying (4.28), then for any permutation z of {1,..., N}, we have 


y Cin(i) = aC +w;) 


i<N i<N 


and taking the supremum over the values of w; and w}, we get 


> Cin(i) 24, 


i<N 


so that M(C) > a. 

The converse relies on the Hahn-Banach theorem. Consider the subset C of 
R= that consists of the vectors (xij)i,j<n for which there exists numbers 
(w;)i<w and (w;);<y such that 


Yo (wi t+ wh) >a (B.1) 
i<N 
Vij SN, xy > wit}. (B.2) 
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Then by definition of a, we have (c;j)i,j;<n ¢ C. It is obvious that C is an open 
convex subset of RY*%. Thus, we can separate the point (c;j)j,j<w from C by a 
linear functional, that is, we can find numbers (pj;)ji,;<N such that 


Vai) EC, x DijCij < > PijXij - (B.3) 


i,j<N i,j<N 


By definition of C, it is obvious that if (x;;) € C and y;; > 0, then (xj; ++yij) € C. 
In particular (B.2) remains true when one replaces x;; by x;; + y;;. This implies 
that p;; = O for each i, j. Furthermore, because of the strict inequality in (B.3), 
not all the numbers p;; are 0. Thus, there is no loss of generality to assume that 
ij<Nn pij = N. Consider families (w;)i<yn , (w} )i<n that satisfy (B.1). Then if 
Xij = wit wi, the point (x;;)i,;<n belongs to C, and using (B.3) for this point, we 
obtain 


» PijCij < > Pij(wi + w)) . (B.4) 


i,j<N i,j<N 
This holds whenever the numbers (w;) and (w;) satisfy (B.1). Considering numbers 


(vii<n with ));<y yi = 0, the numbers (w; + y;) and (w;) satisfy (B.1), and 
from (B.4), we have 


a PijCij < > pi; (wi + yi + w)) 


i,j<N i,j<N 
= > pij(wi + wi) + do y( >> Pij) - (B.5) 
i,j<N i<N  j<N 


This inequality holds whenever )°; <n Yi = Oso that replacing y; by Ay; for A € R, 
the previous inequality remains true. Therefore, the last term in (B.5) must be 0. We 
have shown that 


> e=0= ) > aS 


i<N i<N  j<N 


and this forces in turn all the sums Den pij to be equal. Since pe <n Dij = N, 
we have DisNn pij = 1, for all 7. Similarly, we have }7;<y pij = 1 “ior all j, that 
is, the matrix (pj;)ji,;<n is bistochastic. Thus (B.4) becomes 


Pm PijCij < Yo wi + w}) 


i,j<N i<N 


so that dy; j<n Pijcij L 4. The set of bistochastic matrices is a convex set, so the 
infimum of >); j<n Pijcij Over this convex set is obtained at an extreme point. The 
extreme points are of the type pij = 1{q(i)=;) for a permutation x of {1,..., N} (a 
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classical result known as Birkhoff’s theorem) so that we can find such a permutation 
with Dien Cix(i) S 4. oO 


Proof of Hall’s Marriage Lemma We set c;j = 0 if j € A@i) and cj = 1 
otherwise. Using the notations of Proposition 4.3.2, we aim to prove that M(C) = 0. 


Using (4.27), it suffices to show that given numbers u;(= —wy;), vj (= w), we have 
ViVi EA), vj <u> Dou < dou. (B.6) 
i<N i<N 


Adding a suitable constant, we may assume v; > 0 and u; > 0 for alli, and thus 


CO 
So ui =) card{i < N; uj; > t}dt (B.7) 
i<N 0 

lo.) 
er =4 card{i < N; v; > thd. (B.8) 
i<N 0 


Given f, using (4.33) for 7 = {i < N ; uj; < t} and since v; < u; if j € A(i), 
we obtain 


card{j < N; vj <t}>card{i< N; uj; < t} 
and thus 
card{fi< N; uj >t}<cardi<N; y=}. 


Combining with (B.7) and (B.8), this proves (B.6). oO 


B.2. Proof of Lemma 4.7.11 


Consider the subset £* of £ consisting of the functions f for which f(1/2) = 0. 
To f € £*, we associate the curve W(f) traced out by the map 


mrey G +21 62), 22 42k (AE), 


where (t!, tT”) = t. A curve in C(t, k) can be parameterized, starting at t and 
moving at speed 1 along each successive edges so that it is the range of a map 
of the type t +> t + y(t) where g is a 1-Lipschitz map from [0, 2] to R* with 
(0) = 0 = ¢(2*). Denoting by go and g; the components of g, the curve is 
therefore the range of a map of the type t + (t! + go(t), t7 + gi(t)) where go 
and gj are 1-Lipschitz maps from [0, 2*] to R with go(0) = g1(0) = go(2*) = 
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gi (2) = 0. Considering the function f on [0, 1] given by f(u) = 2~*-! go(2**!u) 
for u < 1/2 and f(u) = 2~*—!g, (2+! (uw — 1/2)) for 1/2 < u < 1 proves that 
C(t, k) C W(L*). We set T = W~!(C(t, k)). Consider fg and f; in T and the map 
h: (0, 1]* > [0, 1° given by 


h(u, v) = ( t) 4241 (yfo(5) +0 — iG), 


=) ay Al ="). 


iS 
2? +2 (fo( 


The area of h([0, 1]”) is at most Soap |Jh(u, v)|dudv, where Jh is the Jacobian 
of h, and a straightforward computation gives 


neonate ase ong )(ots-2054) 


“) +0 —0f( £4) (to(4) - Al§))). 


-(o@(4 


so that since | {| <1, | f{| < 1, 


u e + _ l+u 
|Jh(u, v)| <"(| WS) — fi )) + |) - AS ))). 
The Cauchy-Schwarz inequality implies 
[force wlduan = £274 fo = fila (BS) 


If x does not belong to the range of h, both curves W( fo) and W( fi) “turn the 
same number of times around x”. This is because “the number of times the closed 
curve u +> h(u, v) turns around x” is then a continuous function of v, and since 
it is integer valued, it is constant. In particular, it takes the same value for v = 0 


and v = 1. Consequently, either x € W(fo) N W(fi) or x & W(fo) U WC). 


Thus, the range of h contains the symmetric difference w( To A w( fi), and (B.9) 
implies (4.115). oO 


B.3. The Shor-Leighton Grid Matching Theorem 


Let us say that a simple curve C traced on G is a chord if it is the range of [0, 1] 
by a continuous map g where (0) and ¢(1) belong to the boundary of [0, 1)?. If C 
is a chord, ]0, IP\C is the union of two regions R; and Ro, and (assuming without 
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loss of generality that no point X; belongs to G), 


Yo (ei (Xi) — A(R1)) = — D5 (Ae (Xi) — A(Ro)) 


i<N i<N 


We define 


D(C) =| >) a (Xi) — ACRD)| = | ¥en (Xi) — A(Ra))| - 


i<N i<N 


If C is a chord, “completing C by following the poundagy of [0, 1}°” produces a 


closed simple curve C’ on G such that either Ry = C! or Rp = Cl. '. The length we 
add along each side of the boundary is less or equal than the length of the chord itself 
so that £(C’) < 3£(C). Thus, the following is a consequence of Theorem 4.7.2: 


Theorem B.3.1 With probability at least 1— L exp(—(log N)°/?/L), for each chord 
C, we have 


D(C) < Le(C)VN (log N)*4 (B.10) 


Proof of Theorem 4.7.1. Consider a number £7 < £, to be determined later, and the 
grid G’ C G of mesh width 2~®. (This is the slightly coarser grid we mentioned on 
page 153.) 

A union of squares of G’ is called a domain. Given a domain R, we denote by R’ 
the union of the squares of G’ such that at least one of the four edges that form their 
boundary is entirely contained in R (recall that squares include their boundaries). 
The main argument is to establish that if (4.103) and (B.10) hold, and provided ¢ 
has been chosen appropriately, then for any choice of R, we have 


NA(R’) > card{i < N; X; € R}. (B.11) 


We will then conclude with Hall’s Marriage Lemma. The basic idea to prove (B.11) 
is to reduce to the case where R is the closure of the interior of a simple closed 
curve minus a number of “holes” which are themselves the interiors of simple closed 
curves. 

Let us say that a domain R is decomposable if R = R,U R2 where Rj and R2 are 
non-empty unions of squares of G’ and when every square of G’ included in R; has 
at most one vertex belonging to Ro. (Equivalently, R; M Ro is finite.) We can write 
R= R,U...U Ry where each R; is undecomposable (i.e., not decomposable) and 
where any two of these sets have a finite intersection. This is obvious by writing R as 
the union of as many domains as possible, under the condition that the intersection 
of any of two of these domains is finite. Then each of them must be undecomposable. 
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We claim that 


1 
rl S>ACRp\ Re) <A(R'\R) . (B.12) 
l<k 


To see this, let us set Se = R/\R¢ so that by definition of R;, Se is the union 
of the squares D of G’ that have at least one of the edges that form their boundary 
contained in R¢ but are not themselves contained in Rz. Obviously, we have Sg C 
R’. When £ # £’, the sets Re and Ry have a finite intersection so that a square D 
contained in Sy cannot be contained in Ry, since it has an entire edge contained in 
R,. Since D is not contained in Rz either, it is not contained in R. Thus, the interior 
of D is contained in R’\R, and since this is true for any square D of Sy and any 
& <k, we have 


a( U Si) <A(R'\R). 


l<k 


Moreover, a given square D of G’ can be contained in a set S¢ for at most four 
values of £ (one for each of the edges of D) so that 


DAR, \ Re) = YAS) = 44( [J Se) « 


é<k l<k e<k 


This proves (B.12). 
To prove that (B.11) holds for any domain R, it suffices to prove that when R is 
an undecomposable domain, we have (pessimistically) 


TMR\R) > card{i <N; X; ¢ R}—NA(R). (B.13) 


Indeed, writing (B.13) for R = Re, summing over € < & and using (B.12) 
implies (B.11). 

We turn to the proof of (B.13) when R is an undecomposable domain. The 
boundary S of R is a subset of G’. Inspection of the cases shows that: 


If a vertex t of G’ belongs to S, either 2 or 4 of (B.14) 


the edges of G’ incident to t are contained in S. 


Next, we show that any subset S of G’ that satisfies (B.14) is a union of closed 
simple curves, any two of them intersecting only at vertices of G’. (This is simply the 
decomposition into cycles of Eulerian graphs.) To see this, it suffices to construct a 
closed simple curve C contained in S, to remove C from S and to iterate, since S\C 
still satisfies (B.14). The construction goes as follows. Starting with an edge t1 72 in 
S, we find successively edges 1273, 13T4,... with t 4 tr~-2, and we continue the 
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construction until the first time t, = tg for some € < k —2 (in fact £ < k —3). Then 
the edges TeTe41, Te+1Te+2,---, TkK-1Tk define a closed simple curve contained in S. 

Thus, the boundary of an undecomposable domain R is a union of closed simple 
curves Cj,..., Cx, any two of them having at most a finite intersection. 


oO 
We next show that for each @, the set R is either contained in the closure Cr of Ce 
oO 7) 
(so that C¢ is then the “outer boundary” of R) or else Ce 1 R = @ (in which case C¢ 


is “a hole” in R). Let us fix @ and assume otherwise that Ce ARAVBand R FC. 
We will show that this contradicts the fact that R is undecomposable. Consider the 
domain R; which is the union of the squares of G’ that are contained in R but not 
in C7 so that R; is not empty by hypothesis. Consider also the domain R that is 


7) 
the union of the squares of G’ contained in R whose interiors are contained in Ce. 
oO 
Then R2 is not empty either. Given a square of G’, and since Cy is the interior of 


CG. either its interior is contained in ree or else the square is not contained in Cc}. 
This proves that R = R; U Rp». Next, we show that the domains R,; and R»2 cannot 
have an edge of the grid G’ in common. Assuming for contradiction that such an 
edge exists, it is an edge of exactly two squares A and B of G’. One of these squares 
is a subset of Rj, and the other is a subset of R2. Thus, the edge must belong to C¢ 
for otherwise A and B would be “on the same side of Ce”, and they would both be 
subsets of R; or both subsets of R2. Next, we observe that this edge cannot be on 
the boundary of R because both A and B are subsets of R. This contradicts the fact 
that Cz is contained in the boundary of R, therefore proving that R; and R2 cannot 
have an edge in common. Since R = Rj U Ro, this in turn would imply that R is 
decomposable, contradicting our assumption. 

If Ce is an outer boundary of R, then R C Ch; and consequently for each £’, 
we have C7, C C7. Thus, Cy is an outer boundary of R, then Cj = C7, so that 
Ce = Cy, contradicting the fact that these two curves have a finite intersection. 

Thus, without loss of generality, we may assume that C is the only outer 


boundary of R, and that for 2 < € < K, we have RN Ce = %. The goal now 
is to prove that 


oO 
ROW (Crs (B.15) 
2<l<k 


oO 
It is obvious that R C C7f\Us<p<, Ce so that we have to show that D := 


(Ci\ Usrecex Co \ R is empty. We assume for contradiction that D is not empty. 
Consider a square A of G’ which is contained in D, and a square A’ of G’ which 
has an edge in common with A. First, we claim that A’ C C ie Otherwise, A and 
A’ would have to be on different sides of C;, which means that their common edge 
has to belong to C; and hence to the boundary of R. This is impossible because 
neither A nor A’ is then a subset of R. Indeed in the case of A’, this is because we 
assume that A’ ¢ C;, and in the case of A, this is because we assume that A C D. 
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Exactly, the same argument shows that the interior of A’ cannot be contained in C e 
for 2 < @ < k. Indeed then, A and A’ would be on different sides of Cy so that 
their common edge would belong to C¢ and hence to the boundary of R, which is 
impossible since neither A nor A’ is a subset of R. We have now shown that A and 
A’ lie on the same side of each curve C, so that their common edge cannot belong 
to the boundary of R, and since A is not contained in R, this is not the case of A’ 
either. Consequently, the definition of D shows that A’ Cc D, but since A was an 
arbitrary square contained in D, this is absurd and completes the proof that D = @ 
and of (B.15). 

Let R> be the union of the squares of G’ that have at least one edge contained in 
Ce. Thus, as in (B.12), we have 


SACRE \R) < 4A(R'\R) 


L<k 


and to prove (B.13), it suffices (recalling that we assume that no point X; belongs 
to G) to show that for each 1 < £ < k, we have 


card {i < N; X; © Ce} —A(Co)| < N2~4A(RP\R) (B.16) 


For £ > 2, Ce does not intersect the boundary of [0, 172. Each edge contained 
in Cy is in the boundary of R. One of the two squares of G’ that contain this edge 
is included in R;\R and the other in R. Since a given square contained in R7\R 
must arise in this manner from one of its four edges, we have 


M(Re\R) = 52 2U(Co) (B.17) 


On the other hand, (4.103) implies 


card {i < N; X; € Ce} —MCo| < LE(Co)VN log NY? (B.18) 
Assuming 
2-25 ae Nny3/4 (B.19) 
— JN ? 


where L is the constant of (B.18), we have, using (B.17) in the last inequality, 
Le(Co)VN (log N)*/4 < 2~2-°NE(Cy) < N2-4A(Rp \R) 


and (B.16) follows. 
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When ¢ = 1, (B.17) need not be true because parts of C; might be traced on the 
boundary of [0, 1]*. In that case, we simply decompose C| into a disjoint union of 
chords and of parts of the boundary of [0, 1]? to deduce (B.16) from (B.10). 

Thus, we have proved that (4.103) and (B.10) imply (B.11) provided that (B.19) 
holds. Next, for a domain R, we denote by R* the set of points which are within 
distance 2~” of R’ , and we show that, provided 


a a (B.20) 


we have 
card{i< N; Y; € R*}> NA(R’). (B.21) 


This is simply because since the sequence Y; is evenly spread, the points Y; are 
centers of disjoint rectangles of area 1/N and diameter < 20//N. There are at 
least N(R’) points Y; such that the corresponding rectangle intersects R’ (because 
the union of these rectangles cover R’) and (B.20) implies that these little rectangles 
are entirely contained in R*. Therefore (B.11) and (B.21) imply 


card{i< N; Y; € R*}>card{i<N; X; €R}. (B.22) 


Next, consider a subset J of {1,..., N} and let R be the domain that is the union 
of the squares of G’ that contain at least a point X; , i € J. Then, using (B.22), 


card <card{i< N; X; ¢ R} <card{i< N; Y; € R*}. (B.23) 
A point of R’ is within distance 2~ of a point of R. A point of R* is within distance 
22+! of a point of R. A point of R is within distance /2 - 2-2 < 2-&*+! ofa 


point X; with i € J. Consequently, each point of R* is within distance < 2~+? of 
a point X; withi € J. Therefore, if we define 


ADM HLISN: ACG YN Se PO) 


we have proved that {j < N; ¥; € R*} CU 
that 


A(i), and combining with (B.23) 


iel 
card|_J A(i) = card/ 
iel 
Hall’s Marriage Lemma (Corollary 4.3.5) then shows that we can find a matching 


x for which (i) € A(i) for anyi < N so that by definition of A(i) 


L 
sup d(X;j, Yi) < 27+? < —<(log N)*/4 , 
7 vw 


668 B_ Some Deterministic Arguments 


by taking for €2 the largest integer that satisfies (B.19) and (B.20). Since this is true 
whenever (4.103) and (B.10) occur, the proof of (4.101) is complete. oO 


B.4 End of Proof of Theorem 17.2.1 


The most difficult point is to ensure that the functions 1p satisfy (17.47). In fact, 
rather than (17.47), we shall prove that 


V(k,£) ER, |htk, €)| < Lik2—k1), (B.24) 


which suffices by homogeneity. 
For j < q < p, we consider the partition D(q) of G consisting of all the sets of 
the type 


{a24 +.1,..., (a+ 1)2%} x {b27-/ 41,..., (6+ 122}, (B.25) 


where a and b are integers withO < a < 2?~4 andO < b < 2?-49*/, For3 <q < j, 
we define D(q) as the partition consisting of all the sets of the type 


{a27+1,...,(a+1)27} x {b} (B.26) 


where 0 < a < 2?-4 and 1 <b < 2?. 

We observe that if g’ > gq, R’ € D(q’) and R € D(q), then either R C R’ or 
ROR =8. 

Fixing a functionh € H; (2P-J), we consider the set C = {(k, £); h(k, £2) 4 0} 
so cardC < 27?-/, We proceed to the following construction. Keeping in mind 
that the sequence (D(q)) of partitions increases so that D(p) consists of the largest 
rectangles, we first consider the set U (p) that is the union of all rectangles R € D(p) 
such that 


1 
card(R NC) > 3 cardR. (B.27) 


Then we consider the union U(p — 1) of all the rectangles R € D(p — 1) that are 
not contained in U(p) and that satisfy (B.27), and we continue in this manner until 
we construct U (3). Since the sets U(p), ... , U(3) are disjoint and each is a union 
of disjoint sets satisfying (B.27), we get 


Y> card U(q) < 8cardC <2°P-/*9 | (B.28) 
3<q<p 
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Moreover 
cc U v@. (B.29) 


l<q<p 


This is simply because if (k, 2) € C and (k,¢) € R € D(3), then if (k, 2) ¢ 
Uses U(q), we have R C U(3) since (B.27) holds because card R = 8. We also 
note that 


1 
Re Dg). g<p—1, RC UC) = card(ROC) < 5 card R (B.30) 


Indeed, if R’ > R and R’ € D(q + 1), then card R’ < 4card R. Since R C U(q), 
we have R’ ¢ U(g + 1) so that 


1 1 
card(R MC) < card(R’ NC) < : card R’ < 5 card R . 
Now (B.29) implies 


h= 5 Alp, (B.31) 


where the summation is over 3 < gq < p, R € D(g) and R C U(q). Writing R = 
{kj,...,ko}x{€1,..., €2} as in Proposition 17.3.2, we observe by construction that, 
first, (17.43) holds for g > j and that the function h1p satisfies (17.44) to (17.46); 
second, that (17.49) holds for 3 < g < j, and the function h1, then satisfies (17.50) 
to (17.52). 

We turn to the proof of (B.24). We start with the typical case, R ¢ D(q), 3 < 
q < p. Then (B.30) implies that there exists (ko, £9) € R with h(ko, £0) = 0, 
and (17.39) implies, using also (17.43), 


|h(k, £)| = |h(k, £) — h{ko, £0)| < |h(k, €) — hk, £0)| + hk, 0) — h(ko, £0)| 
< 2/|€ — £0] + |k — kol < 22 — ki), 
and this proves (B.24). 
Next, we consider the case g = p so that R € D(p) and R = {1,...,2?} x 
{b2P-J 4+1,..., (b+ 1)2?-J}. Given an integer r, define 
R=GN({l,...,2?} « (b2? J+1-17,...,+ 12? /4+ryp., 


Then, for r < 2?, we have 


card({b2?-F + 1—1r,...,(b+ L224 +r}N{l,...,2?) > 7/2. 
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Thus card R > 2Pr/2 so thatif2?r > 22P-J+1 then card R> 22P—J, and therefore, 
R contains a point (k, ’) with h(k, €’) = 0. Then R contains a point (k, €) with 
|€ — £'| < r so that the second part of (17.39) implies 


|n(k, £)| <2! . 
Assuming that we choose r as small as possible with 2?r > 27?-/+!, we then have 
|a(k, | < L2?P-42-P2) < 12? , 


and (17.39) shows that this remains true for each point (k, £) of R, completing the 
proof of (B.24). 

Consequently for R € D(q), R C U(q) and j < q < p, we can use (17.48), 
which implies 


| rome us) ~ f medy| < L./pmo 24? card R . (B.32) 


i<N 


Moreover for 3 < q < j, this inequality remains true from (17.53). Recall- 
ing (B.31), summation of these inequalities over R € D(qg), R Cc U(q) 
yields (17.55) and completes the proof. oO 


B.5 Proof of Proposition 17.3.1 


In this section, we prove Proposition 17.3.1. We will denote by 7 an interval of 
{1,..., 2}, that is a set of the type 


T={k; kj Sk < ky}. 
Lemma B.5.1 Consider a map f : {1,...,2?} > Rt, anumber a > 0 and 


A={e; 3I,0eT, Yo fC) = acards} 
lel 
Then 


L 
card A < = FO : 


leA 
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Proof This uses a discrete version of the classical Vitali covering theorem (with the 
same proof). Namely, a family Z of intervals contains a disjoint family Z’ such that 


card|_ J 1 < Leard |_J1=L)° cardT. 


TeZ IeT’ IeT’ 


We use this for Z = {7; oye, f(t’) = acard J} so that A = (<7 I and card A < 
LY rex card. Since ye; f(l') = acard/ for I € TZ’, and since the intervals of 
T’ are disjoint and contained in A, we have a )) <7, card] < oye, f(t). Oo 


Proof of Proposition 17.3.1, We consider h € H, and for j > 2, we define 


ee {(k. &) eG; 3, €e1, \ hk +1) Ak y= 2/ card | 
el 


We claim that when r,s, 2 < 2?, then 
(r,s) ¢ B(j) = |h(r, 2) — hr, 8)| < 2/|l—s]. (B.33) 
To see this, assuming for specificity that s < £, we note that 


|h(r, £2) —h(r, s)| < > lh(r, €’ +1) — her, LD) < 2/ card I 
lel 
where J = {s,s+1,...,&@— 1}, and where the last inequality follows from the fact 
thats € J and (r,s) ¢ B(j). 


Now we use Lemma B.5.1 for each k, for the function f;(€) = |h(k, €+ 1) — 
h(k, €)| and for a = 2/. Summing over k, we obtain 


dB) <— h(k, £+ 1) —h(k, € 
card Bj) DT hk, +1) —hK, OI. 
(k,L)EB(j) 


Now (17.14) implies ee |h(k, 2+ 1) —htk, £)| < 2*P, and therefore, we get 
card B(j) < L,2°?-/ . (B.34) 


We consider the smallest integer jo such that L;2~/° < 1/4 so that L} < 2/0~?, and 
hence for j > jo, we have 


card B(j) < 27P-Jtj0-? (B.35) 
and in particular B(j) 4 G. For j > jo, we define 


gj(k, £) =min{h(r,s) + |k—r] + 2/|€—sl; (rs) ¢ BUD}. 
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The idea here is that g; is a regularization of h. The larger the j, the better the g; 
approximates h, but this comes at the price that the larger the /, the less regular the 
g; is. We will simply use these approximations to write 


h = Bio + (Sjo+1 — Bio) + °°: 


to obtain the desired decomposition (17.41). 
It is obvious that for (k, £) ¢ B(j), we have gj(k, £) < h(k, €) and that 


IsjskK+10-8j%, Ol <1 (B.36) 
Igi(k, €+ I) — gk, 0] <2/, (B.37) 
since 8j is the minimum over (s, t) ¢ B(j) of the functions (k, 2)  A(r, s) + |k—- 
r| + 2/|& —s| that satisfy the same properties. Consider (r,s) ¢ B(j). Then (B.33) 
yields 
la(r, £) — A(r, s)| < 2/|€—s|, 
while the first part of (17.39) yields 
|A(r, £) —h(k, Ol < lk —r|, 
and thus, we have proved that 
(r,s) ¢ B(j) = |h(k, 2) — hr, s)| < |k —r| + 2/|€—s]. (B.38) 


This implies that 9;(k,£) = h(k, €) for all (k, £) € G. Consequently, since we 
already observed that g;(k, €) < h(k, £) for (k, €) ¢ B(j), we have proved that 


(k, ©) € BY) = gjk, 0 =htk, £). (B.39) 
We define h; = gj) so that hy € LH by (B.36) and (B.37). For j > 1, we define 
hj = &j+jo-2 — 8j+jo—1- By (B.39), and since B(j + jo — 1) C BUj + jo — 2), for 
(k, €) € Bj + jo — 2), we have gj+ jy—2(k, €) = Atk, £) = gj+jo—1k, £) so that 
h ;(k, £) = 0. Consequently, 

hj(k, £) #0 = (k, 2) € BUj + jo — 2), 

and thus, from (B.35) that 

card{(k, £); hj(k, 0) #0} < 2°P-J . 


Combining with (B.36) and (B.37), we obtain hj € LH; (2??-/). 
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Now for j > 2p, we have B(j) = @ (since for each k and £, we have |h(k, + 
1) — h(k, £)| < a-P by (17.14)) so that then g; = A from (B.39). Consequently, 
hj = 0 for large j and thus h = )) ., hj. o 


B.6 Proof of Proposition 17.2.4 

The next lemmas prepare for the proof of Proposition 17.2.4. 

Lemma B.6.1 Consider numbers (vx)x<2 and (vi). )k<2?- We define 

g(k) = inf {v, + |k—r|; Lareo} 3 g (k) = inf {v). + |k—r]; Lars?’}. 
(B.40) 


Then 


Ye e®-s' ls Vi (e+ -s®-s'O+le-y)- Bay 


k<2P k<2P 
Proof Obviously, g(k) < vg and g’(k) < u,. If g’(k) = g(k), then 


g (k) — gk) < % — glk) = uy — UE + UK — Blk) 
< vy — vel + up — g(k) +H — g'(k). 


A similar argument when g(k) > g’(k) and summation finish the proof. Oo 


We consider numbers u(k, £) for (k, £) € G, and h(k, @) as in (17.31). We set 
v(k, 2) = min{u(k,s); |€-—s| <1}, (B.42) 
so that 
h(k, £) = inf {v(r, 2) + |k—rl; 1<r<2?} : (B.43) 


We observe that u(k, £) < u(k, £). We lighten notation by writing n(k, €) for n(t) 
when t = (k, £). 


Lemma B.6.2 We have 
mo lu(k, €+ 1) — v(k, | < 10 7 nck, €)(u(k, £) — v(k, £)) . 


k<2?,€<2P k,€<2? 
(B.44) 
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Proof We observe that |a — b| = a + b — 2 min(a, D) and that 
u(k, €) < min(u(k, €+ 1), u(k, £)) 
v(k, 2+ 1) < min(u(k, €+ 1), uk, £)). 
Thus 
ju(k,€+ 1) — uk, £)| = uk, £2) + uk, + 1) —2min(u(k, €4+ 1), uk, £)) 
< uk, £) — v({k, £) +u(k, £+1)— vk, é4+1). 


By summation, we get 


¥ lu(k, 2+ 1) —u(k, £)| <2 ~ (u(k, £) — v(k, £)) 


kS<2P <2? k,€<2P 


and since mo < n(k, £) for (k, £) € G by (17.9) 


mo y. luck, €+1)—u(k, |) <2 Yo nk, Ok, £)-v(k, 0) . (B.45) 
k<2P ,£<2P k,l<2P 


Now 
luck, £) — uk, £)| < |u(k, + 1) — uk, £)| + |u(k, €— 1) — uk, £)| 
so that 


ju(k, €+ 1) — vkk, £)| < ju(k,€+ 1) —u(k, €4+ 1] 4 lu(k, £4 1) — uk, £)| 
+ |u(k, £) — v(k, £)| 
< |u(k, €— 1) — uk, €)| + 3lu(k, €+ 1) — uk, £)| 
+ ju(k, €+ 2) —u(k,£€4+1)|. (B.46) 
Plugging (B.46) in the left-hand side of (B.44) and using (B.45) prove (B.44), the 
factor 10 being 2(1+3+1). oO 


Proof of Proposition 17.2.4. Given 1 < £ < 2?, we use Lemma B.6.1 for vy = 
v(k, £), and U; = v(k, €+1), where v(k, £) is given by (B.42). Thus, g(k) = h(k, £) 
and g/(k) = h(k, €+ 1). Summing the inequalities (B.41) for 1 < k < 2?, we get 


~~ |h(k, 2+ 1) —h(k, | < 2) wk, £) —hik, £)) 
k<2P,€<2P ke 


x > |u(k, €) — v(k,£+1)| . 


ke 
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Using (B.44) and since mo < n(k, €), we get 


mo YY. hk, €+1) —hk, O| <2 nk, OOK, © — AK, 8) 


k<2P,€<2P ke 


+10 ~ nk, £)(u(k, £) — v(k, €)) 


ke 


< 105° ntk, l)(u(k, £) — htk, £)) , 
ke 


using that h(k,€) < v(k,£) < u(k, €) in the last line. This proves (17.33), 


and (17.32) is obvious. oO 


Appendix C 
Classical View of Infinitely Divisible 
Processes 


In this appendix, we explain the classical view of infinitely divisible processes and 
why it coincides with our direct definition of Sect. 12.2. 


C.1 Infinitely Divisible Random Variables 


Definition C.1.1 We say that a r.v. X is positive infinitely divisible if there exists a 
positive measure v on R™ such that 


Jo A I)dv(B) < co, (C.1) 


Va €R, EexpiaX = exp ( — [oa — exp(iap))dv(B)) (C.2) 


The use of (C.1) is to ensure that the integral in the right-hand side of (C.2) makes 
sense. To motivate this definition, let us recall the definition (12.1) of Poisson r.v.s. 
Consider finitely many independent Poisson r.v.s X; with EX; = a, and numbers 
Bx = 0. Then, by independence, (12.3) implies 


Eexp (ia > PrXx) = exp ( - > ax(1 — exp(iaBx))) 
k k 


= exp ( _ fa - exp(iaB))dv(B)) ; (C.3) 
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where v is the discrete positive measure on R™ such that for each 8 € R™, we have 
v({B}) = >-{ax; Be = B}. Let us observe the formula 


ES AX: = D> fran = f pdv66). 
k k 


It is appropriate to think of a positive infinitely divisible r.v. X as a (continuous) 
sum of independent r.v.s of the type BY where Y is a Poisson r.v. and 6 > 0. This is 
a sum of quantities that are > 0, and there is no cancellation in this sum. The r.v. X 
has an expectation if and only if { Bdv(B) < oo (and the value of this expectation 


is then [ Bdv(B)). 


Definition C.1.2 We say that ar.v. X is infinitely divisible (real, symmetric, without 
Gaussian component) if there exists a positive measure v on Rt such that 


[e A 1)dv(B) < co, (C.4) 


Va €R, EexpiaX = exp(— [oa - cos(B))dv()) (C5) 


The use of (C.4) is to ensure the existence of the integral in the right-hand side 
of (C.5). We shall prove the existence of X in Sect. C.3. To motivate this definition, 
consider again a Poisson r.v. Y of expectation a and an independent copy Y’ of Y. 
Then (12.3) implies 


Eexpia(Y — Y’) = exp(—2a(1 — cos(a)) . (C.6) 


Thus, when ar.v. X is a sum of independent terms 6x (VY, — Y;) where Y; and Y, 
are independent Poisson r.v.s of expectation a; and Bx, => 0, it satisfies (C.5), where 
now v is the discrete positive measure on R* such that v({B}) = 2 )“{ax; Be = B} 
for each B € R*. 

It is appropriate to think of an infinitely divisible r.v. X as a continuous sum of 
independentr.v.s of the type B(Y — Y’) where Y and Y’ are independent Poisson r.v.s 
with the same expectation. These r.v.s are symmetric rather than positive, and there 
is a lot of cancellation when one adds them. This is why the formula (C.5) makes 
sense under the condition (C.4) rather than the much stronger condition (C. He 


' This dichotomy no cancellation versus cancellation is a leitmotiv of the book. 
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C.2 Infinitely Divisible Processes 


Consider a finite set T and let us denote by 6 = (B(t))rer a generic point of R’.A 
stochastic process (X;)rer is called (real, symmetric, without Gaussian component) 
infinitely divisible if there exists a positive measure v on R? such that Jer (6 (t)2 A 
1)dv(B) < oo for all ¢ in 7, and such that for all families (@,;);<-7 of real numbers, 
we have 


Eexpi Sa X: = exp(- 


teT R 


; (1 —cos (> a B()) )dv(B)) (C7) 


teT 


The positive measure v is called the Lévy measure of the process.” Each of the linear 
combinations yi er 1X; is an infinitely divisible r.v. 


Exercise C.2.1 Assume that fort € T, the r.v.s X; is infinitely divisible and that 
these r.v.s are independent. Prove that (X;);e7 is an infinitely divisible process. 


As an example of infinitely divisible process, assume that v consists of a mass a at 
a point B € IR’. Then, in distribution, (X;)er = (B(t)(Y — Y'))rer where Y and 
Y’ are independent Poisson r.v.s of expectation a/2. One can view the formula (C.7) 
as saying that the general case is obtained by taking a (kind of continuous) sum of 
independent processes of the previous type. Lots of cancellations occur when taking 
such sums. 

For the purpose of studying the supremum of a process, our definition of 
infinitely divisible processes is the most general one: It is essentially not a restriction 
to consider only the symmetric case (using the familiar symmetrization procedure 
which replaces the process (X;) by the process (X; — X;) where (X/) is an 
independent copy of the process (X;)), and it is not a real restriction to exclude 
Gaussian components which are very well understood. 

When T is infinite, we still say that the process (X;);e7 is infinitely divisible 
if (C.7) holds for each family (@;);<7 such that only finitely many coefficients are 
not 0. An infinitely divisible process indexed by T is thus parameterized by a o- 
finite measure on R? (with the sole restriction that Ai (B(t)2 A 1)dv(B) < oo for 
each t € T). Only some extremely special subclasses have yet been studied in 
any detail. The best known such subclass is that of infinitely divisible processes 
with stationary increments. Then T = R* and v is the image of wz ® A under 
the map (x,u) + (x1psu});eR+, Where yz is a positive measure on R such that 
{ee A 1)du(x) < oo and where A is Lebesgue measure. 


? Assuming without loss of generality that v({0}) = 0, it is unique. 

3 One then considers the Lévy measure as a “cylindrical measure” that is known through its 
projections on R* for S a finite subset of 7, projections that are positive measures and satisfy 
the obvious compatibility conditions on how these projections relate to each other. It is sometimes 
necessary to go beyond this naive point of view. The final word on how to define the Lévy measure 
seems to be [93], but we are not really concerned with these matters here. 
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C.3 Representation 


We show that a process as in Definition 12.2.1 is indeed “an infinitely divisible 
process of Lévy measure v” in the classical sense of (C.7). We simply have to show 
that if X; = et e;Zj;(t), then 


Eexpi ) aX: = exp(— (1 — cos (> a/B(#)))dv(B)) (C.8) 


teT teT 


Leaving some convergence details to the reader, we first take expectation E, in the 
r.v.s €; given the Z; to obtain, setting uj = )°,e7 ar Zj(t), 


E, expi > Ejuj = I] cOsuj = exp) log cosu; , (C.9) 


j2l j2i j21 


and we simply take expectation using the formula (12.8) to obtain (C.8). Conversely, 
an infinitely divisible process of Lévy measure v has the representation X; = 
Viet €;Zj(t) where (Z;) ;>1 is a realization of a Poisson point process of intensity 
measure v, the Lévy measure of the process. 


C.4  p-Stable Processes 


Finally, we prove the claim that p-stable processes are infinitely divisible and 
conditionally Gaussian. It is proved in [53], Theorem 5.2, that a p-stable process 
has a spectral measure and that there exists a finite positive measure m on R? such 
that for any family (@;);e7 we have 
1 
Eexpi ) \a,X; = exp(- = 1)|?dm(B)) . C.10 
expi ) aX; = exp 5 fey oaeeo m(B) (C.10) 


teT 


We observe the formula 
[oc -costas- pane = C@Pylal?/2, 
Rt 


which is obvious through change of variable. Let us denote by 1 Lebesgue’s measure 
on R*. Consider the probability measure v on R? such that 


C(p)v is the image of 2 ® m under the map (x, y) & xl/Py : (C.11) 
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Then 


ie (1 2 cos (Jar B(1)))dv(p) 


teT 
= Cp [, i (1 — Cos (x Lear) Meaney 
1 
7 a | oar @|"dm(y), (C12) 
is teT 


and combining with (C.10), this shows that v is a Lévy measure for the process 
(Xi )ter- 

To prove that a p-stable process is conditionally Gaussian, consider then a finite 
positive measure m on IR’ and the measure v as in (C.11). Consider a Poisson 
point process (Z;) ;>1 of intensity measure v and independent Gaussian r.v.s (gj). 
Then one checks as previously that for a > 0, the process X; = a yet gj Zj(t) 
is p-stable with spectral measure aK (p)m. Consequently, if the p-stable process 
(X;)ter has spectral measure m, it has the same distribution as the process 
(K (py Y js1 8j)Zj)O)rer- 


Appendix D 
Reading Suggestions 


It has been a deliberate choice not to include results of other authors which were 
proved later than the first edition of this book, as we try to present only results in 
their final form. The single exception to this policy concerns the recent results of G. 
Pisier in Sect. 19.4. In this appendix, we point out some directions connected to the 
main ideas of this book. 


D.1 Partition Schemes 


The work of R. van Handel [140, 141] attempts to provide a new view on partition 
schemes. The formulation of Theorem 2.9.8 is directly inspired by this work 
although in a sense, this theorem is a hybrid between the methods of van Handel 
and the original methods of the author. 


D.2 Geometry of Metric Spaces 


Let us recall that a distance 5 on a T is called ultrametric if 
Vs,t,vET, d(s,t) < max(d(s, v), d(t, v)) . 


Given A > 1, let us say that a subset S of a metric space (T, d) is an A distortion of 
an ultrametric space if for some ultrametric distance 5 on S we have 


Vs,t €S; 8(s,t) <d(s,t) < AS(s,t). 
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One may then investigate, given a metric space, whether it contains subsets which 
are A distortion of ultrametric spaces, and which are in a sense large. This question 
is investigated in great detail in the papers [65] and [67]. The equivalence of the 
quantity (3.13) with y2(7T,d) means that for any metric space, one may find a 
subset S which is an L distortion of an ultrametric space and for which y2(S, d) > 
y2(T, d)/L, and the results of [65] and [67] can be seen as far-reaching extensions 
of this result. We quote here the main result of [67]. 


Theorem D.2.1 Given A > 1, there exists a constant K = K (A) with the following 
property. Consider a metric space (T,d) and a probability measure on on T. 
Then there is a subset S of T which is an A distortion of an ultrametric space, and 
a probability measure v on S such that for each x € X and eachr > 0, one has 


v(B(x,1r)) < w(Ba, Kr))'7/4 . 


D.3 Cover Times 


Consider a connected graph with set of edges E and set of vertices V. Starting 
with a given vertex v, we consider the following random walk: At each step, if the 
walker is at a given vertex w, he chooses at random with equal probability to move 
to one of the vertices connected to w by an edge. The cover time tT, is defined as the 
first time all vertices have been visited. On the other hand, let us think to the graph 
as an electrical network, each edge being given conductance |. Then the effective 
resistance R(u, v) of the network between edges u and v defines a distance R on V. 
The main result of [28] is very clean: 


<Ly(V, VR). 


1 
—yx(V,v R)? < maxEt, 
L veV 


D.4_ Matchings 


Under the lead in particular of Giorgio Parisi, physicists are bringing new ideas to 
the theory of matchings (see, e.g., https://arxiv.org/pdf/1402.6993 pdf), and some 
of these ideas have been made rigorous; see [4]. Many other possible nontrivial 
directions in the theory of matchings have not been fully explored; see [133]. 
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D.5 Super-concentration in the Sense of S. Chatterjee 


Given a Gaussian process (X;)re7, we have learned in (2.10.6) that the fluctuations 
of sup,<7 X; are typically not larger than o := sup, <7 (EX?)!/ > However, it turns 
out that in many cases, these fluctuations are of a lower order. Probably the simplest 
case is that if (g;);<jx are standard independent Gaussian r.v.s, then max;<y gi has 
fluctuations of order 1/,/log N (an easy exercise). A more elaborate example is as 
follows. Denoting by T the unit sphere of R”, let us set Y; = Dee tigi and X; = 
Lie tit; gi,; where g;, g;,; are iid. standard Gaussian r.v.s. The fluctuations of 
sup,<7 Y; are of order 1, but the fluctuations of sup,-7 X; are known to be of order 
n—'/®. In such cases, one says that one has super-concentration. Sourav Chatterjee 
[25] has discovered (among many other things) that super-concentration is related 
to the fact that the function t > X; has typically many near maxima (which is not 
the case of the function t b> Y;). 


D.6 High-Dimensional Statistics 


An important and rapidly expanding area of research is High-Dimensional Statistics, 
where the goal is to study large amounts of data, living in a high-dimensional space. 
Because high-dimensional data appears in important and diverse applications (e.g., 
DNA sequencing, image and speech recognition, autonomous car systems, Internet 
search engines, and many more), a well-established theoretical understanding of the 
area is of the utmost importance. 

Chaining plays a key role in High-Dimensional Statistics: The analysis of 
statistical recovery procedures often calls for the study of the supremum (or 
infimum) of certain random processes, and that is where chaining methods take 
center stage. 

To give a flavor of the problems one encounters, consider the incredibly important 
area of compressed sensing (used, e.g., in MRI imaging and remote sensing (radar)). 
One wishes to recover an unknown signal (vector or function), living in a very high- 
dimensional space. The signal is sparse: Its expansion with respect to some natural 
basis has a few nonzero coefficients.! One receives, as data, relatively few linear 
measurements of the signal, and those may be further corrupted by noise. The goal 
is to use the given data to construct a good approximation of the unknown signal. 
A general introduction to this topic can be found in [34], and a recent example the 
way chaining is used in [74]. 

The novelty in compressed sensing was the realization that under rather minimal 
assumptions, a rather small number of measurements were required to generate 
a good approximation of the signal—essentially scaling linearly in the degree of 


' The degree of sparsity is then the number of nonzero components. 
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sparsity and only logarithmically in the dimension of the space in which the signal 
lives) . Moreover, recovery can be performed in an efficient way computationally. 

A simplified way of viewing recovery procedures is as follows: If fo is the 
unknown vector, and (fo, X1),..., (4, Xn) are the given linear measurements 
(noise-free), a reasonable guess of an approximating vector would be some f, that 
is also sparse and satisfies that ((t, X; ye , is close to the given measurements. The 
success of recovery is based on the fact that if x is any sparse vector that is far 
away from fo, that fact will be exhibited in the value of N = 4 (x — to, Xi)’; 
to that end, one has to study the behavior of the quadratic empirical process 
u— N-! yy (u, X;)? — E(u, X;)? on the set of sufficiently sparse vectors [73]. 
Related topics may be found in Sects. 14.2 and 14.3. 

Empirical processes often occur when dealing with high-dimensional data. 
Two generic examples of empirical processes are the centered product empiri- 
cal process indexed by two classes of functions, F and H, that is, (f,h) > 
N7! y ft (Xih(Xi) — Efh, and the centered multiplier empirical process 
indexed by the class F, that is, f > N7! ae & f (Xi) — Eéf, with (&)N, being 
arandom vector and (X1,..., Xx) selected randomly according to some procedure 
(e.g., independent sampling). The study of these two processes appears in diverse 
applications besides sparse recovery. General references are [55, 138, 142] and [71] 
are a few more interesting examples of random processes one encounters in High- 
Dimensional Statistics and the chaining arguments that are used in their analysis. 


? Sparse recovery procedures do not require preliminary information on the degree of sparsity of 
to. In the very basic setup, the procedure selects ¢ that agrees with fp on the sample points and has 
the smallest possible £; norm. And if fg happens to be sparse, one can show that so will be the 
vector selected by this procedure (see, e.g., [139] and [91]). 


Appendix E 
Research Directions 


It seems worthwhile to recapitulate some of the research problems we stressed in 
this book. It is very risky to attempt to evaluate the potential of a research problem, 
but we will try.! 


E.1 The Latala-Bednorz Theorem 


Find a proof that you can explain to your grandmother. It is hard to understand why 
the current proof works. This is a pity. The core material of this book, the theory 
presented in Chaps. 2 to 13, is rather beautiful, with the exception of that proof. 
Ideally, one wishes for a new conceptual idea. 


E.2 The Ultimate Matching Conjecture 


It is stated in Problem 17.1.2. Possibly, it is only a hard combinatorial problem 
to crack, to understand the geometry of certain classes of functions, and that the 
solution would not open new horizons. 


' This author’s most notable contribution to mathematics, the discovery of new directions for 
concentration inequalities, started by studying a problem of secondary importance. 
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E.3. My Favorite Lifetime Problem 


It is explained in Sect. 13.3. There is no telling how difficult this is and what new 
horizons a positive solution would open. It is reserved for the very ambitious. 


E.4 From a Set to Its Convex Hull 


The general problem is to understand geometrically how the smallness of a set, in 
the sense of certain y functionals, transfers to its convex hull. The first and most 
important occurrence of this problem? is Problem 2.11.2. Further occurrences are 
Problem 8.3.5 and Problem 8.3.12. But it is not certain that this is possible. Also 
related are Problems 2.11.13 and 2.11.14. 


? It is while discussing this problem with Keith Ball that I invented the generic chaining. 


Appendix F 
Solutions of Selected Exercises 


As the purpose of the exercises is to have the reader (rather than the author) work, 

the solutions are sketchy and have not been worked out with the same dedication as 

the rest of this book. Therefore, expect much lousiness and some plain nonsense. 
Exercise 1.3.1 Considering for each (s, t), the largest k with d(s, t) < 2~* yields 


X; — X;,|? 
Laat <L)> sup 21x, —X,|P. 
s,teG d(s, t) k>0 8.teGd(s,t)<2-* 


Since E sup, seG:d(s,t)<2-* |X; — X:|? < K(m, p, a)2km—a) by (1.10), taking 
expectation yields (1.12) since B + m—a <0. 

Exercise 1.3.2 By Jensen’s inequality, we have y(E max; V;) < Eg(max; Vj). 
Furthermore, g(max; V;) < >>; g(V;) so that Eg(max; V;) < 0; Eg(Vi). 

Exercise 1.3.3 It follows from (1.13) and (1.14) that the rv. Y, of (1.5) satisfies 
EY, /cn < gy! (K (m)2""" dy), and (1.15) follows by combining with (1.7). 

Exercise 1.4.3 The distance d associated to Brownian motion is given by 
d(s,t) = ./|s — ft] and N({0, 1], d, €) < Le~?. The condition |s — t| < 6 implies 
d(s,t) < S68. Dudley’s bound is then L ie Vlog (L/e?)de < L./6 log(2/8). 

Exercise 2.2.2 Just use that |X;| < |X1— Xij)|+ |X| < sup,,|Xs — Xt] + |X|. 

Exercise 2.3.1 Because P(Y > aEY) < 1/a by Markov’s inequality. 

Exercise 2.3.3 (a) This means that given L; > 0, there exists L2 such that 
sup, xy — Lyx? < y?/*/L2, which is proved by computing this supremum. (b) 
Let us then assume that p(w) < L; exp(—u?/L1) for u > L. Given a parameter 
A, for Au > L1, we have p(Au) < L exp(—A7u?/L}). Also, we have p(Au) < 1 
so that p(Au) < 2exp(—uw?) for u < ./log2. Assuming that A./log2 > Ly, it 
suffices that L) exp(—A7u?/L1) < 2exp(—u?) for u > ./log 2. This is true as soon 
as Ly < 2exp(u?(A/L, — 1)) for u > ./log2 and in particular as soon as A is 
large enough that L; < 2 exp(log 2(A?/Ly — 1)). (c) Taking logarithms, it suffices 
to prove that for x > 0 and a constant L;, one has L1x — x3/2/L 1 <Ly- x3/2/L 
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for a certain constant Ly. Assuming first Lz > 2L), it suffices to find L2 for which 
L, — x3/?/(2L1) < Lz which is obvious. 

Exercise 2.3.5 (a) Since the relation P(sup,-y |gx| = u) < 2N exp(—u?/2) 
does not require the r.v.s to be Gaussian. (c) Consider sets 2, with P(x) = 1/N, 
which we split into two sets §2,,1 and 2,2 of probability 1/(2N). The centered 
rv.s ge = Jlog Ne, , — 19,5) satisfy P(|ge| => uw) < exp(—u7) because the 
left-hand side is 0 for u > ./logN and 1/N for u < ./logN. When the sets 
Q, are disjoint, supp<y gk = log N on J, -y &x,1 and is zero elsewhere. Thus, 
E(supp<ny 8k) = Slog NP(Upen 2k) = Vlog N/2. (c) When the sets 2,1 are 
independent, it is still true that sup,—y gx = J/log N on Lj, ey 2,1, and by (2.18), 
this union has probability > 1/L. Next, suPp<y Bk = O except on the set () ken §2k,2 
where this supremum is —./log N, and this set has probability < 1/(2N). Thus, 
E suppey ge > Vlog N(1/L — 1/(2N)). 

Exercise 2.3.6 P(Up<y Ax) = 1 — [[pey (1 — P(Ag)) and 1 — x < exp(—x) for 
x>0. 

Exercise 2.3.7 Use (b) for u such that P(g; > u) = 1/N so that u is about 
Vlog N. 

Exercise 2.3.8 (a) We have Eexp(Y/(2B)) = ie P(exp(Y/2B) > u)du = 
1+ [7° PY = 2Blogu)du < 142 f° u-7du = 3. Calculus shows that (x /a)* 
takes its maximum at a = x/e so (x/p)? < exp(x/e). Using this for x = Y/B 
and taking expectations yield the result. The rest is obvious. (b) follow by using (a) 
for the variable Y*. (c) If EY? < p?B?, Markov’s inequality yields P(Y > u) < 
(Bp/u)?, and for u > Be, one takes p = u/(Be) to get a bound exp(—u/(Be)). 

Exercise 2.3.9 Given any value of x > 0, we have (Elg|?)!/? > xP(|g| > x)!/?, 
and for x = ,/p, one has P(|g| > p)!/? > 1/L. 

Exercise 2.4.1 We have d(t,,7,) = 0 for each n. For k > 2, let n(k) be 
the smallest integer with N,Q) = k so that Nagy < k and thus anh) < 
Llogk. Furthermore, d(t,, Tn) < d(th,t1) forn < n(k) and d(t,T,) = O for 
n > n(k) since then t, € T,. Thus, es 2"/2d (th, Tn) < L2"®/2d (te, 1) < 
L202} flogk < L. 

Exercise 2.5.3 Take T = {—1, 0, 1}, U = {-1, 1}, d(x, y) = |x—y|, soeo(T) = 
1,eg(U) = 2. 

Exercise 2.5.4 For € > eg(T), we have N(T, d, €) = 1. Fore > e,(T), we have 
N(T, d,€) < N,. Consequently, 


io.@) en(T) 
7 Vlog N(T, d,€) de = )° Jlog N(T, d, €) de 
0 n>0 enti (T) 
= > en(T)¥ log Nn+1 
n>0 


and the result holds since log Nn41 < gn, 
Exercise 2.5.5 If (A/eé,)* = log Nn, then en(T) < €p. 
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Exercise 2.5.6 Indeed if no is the smallest integer with cardT < Ny , then 
Yee at = ae 2"/*e,(T), and there are about loglogcard 7 terms 
in this sum. Furthermore, we control each term of the sum: For each k, we have 
super ng 2 dh Te) = 2° sup, pd, Te) = 2*"etT). 

Exercise 2.5.8 Consider a subset W of T maximal with respect to the property 
that s,t € W = d(s,t) > 2e. Since the balls of radius € centered at the points of 
W are disjoint, each of measure > a, we have card W < 1/a, and the balls centered 
at the points of W and of radius 2€ cover U by Lemma 2.5.7. 

Exercise 2.5.9 (a) Consider a subset W of T which is maximum with respect to 
the property that any two of its points are at a distance > €. Then the balls centered 
on W of radius € cover T. Furthermore, two points within a ball of radius € are at 
mutual distance < 2e. (b) If A is covered by N balls of radius €, these balls have the 
same volume Vol(€B) so Vol(A) < NVol(€B). Consider W C A as in (a) but for 
2e rather than €. The open balls of radius € centered at the points of W are disjoint 
and entirely contained in A + €B so that card WVol(eB) < Vol(A + €B). This 
proves (2.45). (c) Since Vol(eB) = €*Vol(B). (d) If €, is defined by (1/€n)* = Ny 
and €/ by (1 + D/e yr = N,, then by (c) €& < en(B) < €/. Then e,(B) is of 
order 1 for 2” < k and of order 2~2"/* for 2” > k. As a result pare 2"/2¢,(B) = 
Yoon ey 2 7en(B) + Yoong 2/7en(B) < Lak. (©) We cover T with Nyy balls of 
radius 2, (7) and each of these by N,, balls of radius < Lee. (T). There are 
at most Nng Nn < Nn+1 of these balls, and this proves the claim. 

Exercise 2.5.10 (a) We have (7) = (€+1)/(m—- £) (rs and for < k—1, 
we have (£ + 1)/(m — £) < 2k/m. Iteration of this relation shows that for 2 < k, 
we have (7) < (2k/m)*~‘('). When k < m/4, we have 2k/m < 1/2, and thus, 
Sie jp(2k/m)~€ < 2(2k/m)‘/*. This proves (2.48). (b) We have cardZ = (‘?). 
On 7, consider the distance d(/, J) := card(UJ A J) where J A J is the symmetric 
difference (J \ J) U (J \ I). We bound from above the cardinality of a ball of Z of 
radius k/2. Given I € TZ, a set J is entirely determined by J A J. Thus, the number 
of sets in Z for which card(UJ A J) < k/2 is bounded by Lo<e<k/2 a Thus (2.48) 
shows that (BCI, k/2)) < 2(2k/m)*/*(Z) and N(Z, d,k/2) > (m/(2k))*/?/2 
are the desired result. 

Exercise 2.5.11 Consider an integer k with k/m < 1/4. It follows from 
Exercise 2.5.10 that N (Tx, d, 1//2k) > (m/(2k))k/? /2. In particular for 1 < k < 
/m, we have log N(T, d, 1/LVk) = (k/L) logm so that for m~!/4/L < € < 1/L, 
we have ,/log N(T, d, €) => (1/Le)./logm, from which it readily follows that the 
right-hand side of (2.41) is > (log m)?/2/L, 

Exercise 2.7.5 Consider the smallest integer ng with cardT < Ny, so that 
2n0/¢ < K(logcard T)!/%. Take A, = {T} forn < no and A, consisting of the 
sets {t} fort € T whenn > no. 

Exercise 2.7.6 (a) We enumerate 6, as (Be)c<y,. The sets Cj = Bj \ Uc<; Be 
then provide a partition C, of T of cardinality < Nn», and C; C Bj € By. We define 
Ao = {T} and/or n > 1 define A, as the partition generated by An—1 and Cp_1 so 
that cardC, < N a = N,, and forn > 1, each element A of A, is contained is an 
element of C,—1 and thus in an element of By. (b) We can cover T by Np balls of 
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radius < 2e,(T). We use these balls as covering 6,. Then each element of A, has 
diameter < 2e,_;(T). 
Exercise 2.7.7 We set Ag = {T} and forn > 1 A, consists of the sets of the type 
BOC for B € Bn_, and C € C,_1. There are at most N2 | < N, such sets. 
Exercise 2.7.8 (a) Consider an admissible sequence (A,,) with 


sup ) 24/7 A(Ag(t)) < 2y2(T, d) - 
teT k>0 


In particular for each A € An, we have A(A) < DET a P, d). Thus, A is 
contained in a ball of radius < 2-41 (T, d). As these sets A for A € A, cover 
T, we have e,(T) < 27"/*+!y9(T, d). (b) Consider € > 0 with N(T,d,€) > 1, 
and the smallest n with N(T,d,¢€) < Ny so that ,/log N(T,d,¢€) < py, By 
definition of n, we have N,-1; < N(T,d,€) so that by definition of e,-1(T), we 
have € < e,_1(T), and €,/log N(T, d, €) < L2”/*en_\(T) < Ly2(T, d). 

Exercise 2.7.9 Consider the smallest integer no such that 2”° > m. By 
Exercise 2.5.9 (e) for n > no, we have en41(T) < Eo e475 so. that 
Donny 27! en(T) < 12"! ey, (T). Then 


So 2 en(T) < LY 2" en(T) < Llog(m + ly2(T, d) 


n>0 n<no 


because each term of the middle summation is at most Ly2(7,d), and there are 
no +1 < Llog(m + 1) terms. 

Exercise 2.7.10 It was already shown in Exercise 2.5.11 that the estimate (2.58) 
is optimal, but the present construction is easier to visualize. According to (2.45) in 
R™, there exists a set of cardinality 2™ in the unit ball consisting of points at mutual 
distance > 1/2. Denote by no the largest integer with 2”° < M. Thus, forn < no, 
there exists a set T;, of cardinality N,, consisting of points within distance 2~"/?*! of 
the origin but at mutual distances > 2-"/2 Set T = Un<ng In So that forn < no, we 
have €n(T) > €n(Tn) > 27"/?. Consequently, bee 2/26, (T) >no = (log M)/L. 
One can prove that y2(T, d) < L by proceeding as in Exercise 2.4.1. 

Exercise 2.7.12 The inequality (2.33) never used that the process is centered, so 
it remains true, and combining it with (2.6) implies E sup,¢7 |X; — X1)| < LS, and 
the result. 

Exercise 2.8.2 For example, T consists of a sequence of real numbers which 
converges fast to 0, for example, T = {1/n!; n > 1}. For each value of r, the largest 
value of m such that (2.76) holds is finite. 

Exercise 2.9.2 The growth condition is satisfied because m = N, > N, = 4, and 
there exist no separated family. In this case, the inequality y2(T, d) < Lr F(T)/c* 
is false because the left-hand side is positive and the right-hand side is zero. 

Exercise 2.10.4 Let a = min p<m(EX%)'/? > O where the rv.s (Xp) p<m 
are independent Gaussian. (Observe that it is not required that the numbers EX? 
be all equal.) Consider a number cy such that P(X, > cy) = 1/m so that 
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Cp = a,/logm/L. The m sets 2p = {Xp = Cp} are independent of probability 1/m 
so that their union §2 is of probability 1 — (1 — 1/m)” > 1/L. Given w € &2, there 
is p <m with w € {2p so that Xp > cp and thus 12 maxp<m Xp = 129 MiNp<m Cp 
and consequently by taking expectation E1gq maxp<m Xp > (a/L)./logm. On 
the other hand, maxp<m Xp = —|X1|, and since we may assume without loss 
of generality that EX? = a’, we get Elge max,pX, > —E|Xi| => —La. 
Consequently, E maxp<m Xp > a((1/L),/logm — L), the desired result. 


Exercise 2.10.7 Then o = | and Y = super X; = \/oj<, 87 is about ./n. 


Here, you can visualize the fluctuations of Y: Ey* — (EY?)? < Ln so that the 
fluctuations of Y2 are of order /n, and since Y 2 is of order n, the fluctuations of Y 
are of order 1. (Remember that ./A + a — JA is of order a/V/A.) 

Exercise 2.11.7 Enumerate T U {0} as a sequence (x;)j>1 with xj = O and 
d(x1, xj) < L/logi. 

Exercise 2.11.10 Look at the case where T consists of one single point. 

Exercise 2.11.11 The first part is obvious. For the second part, simply replace 
(2.134) by t = Pps 1 ta) — Hn-10) = Lys An unl). 

Exercise 2.11.12 By homogeneity, we may assume that S = Esup, X; = 1. 
Consider an integer ng with 27"° ~ 6. Starting with an admissible sequence 
(B,) of T such that sup, >. 2"/2 A(B,(t)) < LS, we consider the admissible 
sequence (A,) given by A, = B, ifn > no and A, = {T} forn < no. Then 
sup, >> n>0 2"/2 A(A,(t)) < LS. We set Tn = {0} forn < no and U, = Q for 
n < no, and we follow the proof of Theorem 2.11.9. Forn > no andu € Uy, we 
have ||u|| < L27" < L6é. 

Exercise 2.11.15 To prove (a), we observe that 


Well = WWil = sup { Drewes Dox?} = [Yre2, = va. 


i<n i<n i<n 


If ||x|| < 1, then (2.140) implies E exp(|U,(x)|I?/4) < L4 so that P(||U,(x)|| = 
a,/q) < exp((L — a’/4)q) by Markov’s inequality and (2.141) by homogeneity. 
Consider now a sequence fg, € R” such that ||“ || < min(6, LA/,/log(k + 1)). Then 
for w > 1, 


P(|[Ug (te) || = Lw6./q) = P(\|Ug (tu) || = Lulltell “Q < exp(—v’q) 


where v = w6/||t,|| => w, and thus, 


D> PUlUg Il = Lwd./G) < D> min(exp(—w*g), exp(—w75"q log(k+1)/A*)), 
k>1 k>1 


and when 67g /A? > 1, this is < exp(—w7q/L). The result follows since we can find 
such a sequence (t,) for which T C Leonv{t,,k > 1} and A = Esup,er Xr < 5/9. 
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Exercise 2.11.17 (a) The subsets of T obtained by fixing the coordinates of t 
in each T, for k € J° have diameter < e€, and there are Te ye Mx such sets. 
(b) Take nx = ./log My and use (a) to see that /log N(T,d,€) < S(e). (c) 
We reduce to the case )°,-y «xm = 1 by replacing «, by Ae; for a suitable A. 
We set a, = €xnx, and we consider the probability 2 on by , N} such that 
w({k}) = a. Define h(k) = ex/ne. Then fp hdu = pe, & abl Spe l/hdu = 
Depo i (d) For any non-increasing function S on Rt, we have tee S(e)de < 
yo SC fue) The, SC" a Jae (1/h) du. Setting Be = Ae \ Acs, we 


have $(2~*)? < Y°,<¢ Cm where cm = fp (1/h)dy. Thus, §(2~-) < Dn cp Gm 
and 9°, 2~€S(2-*) < 1p 2-8 Yep em < Yim 27" 4/Em- Recalling that the set 


Am is of the type Am = {h < ty} for a certain t,, we have By = {tn41 < h < tn} 
and in particular Cm < 4(Bm)/tm+1.Now, fp hd = 2-22-24) > 2-204) 
so that 2-20"+) < Sz, du < tm&(Bm), and changing m into m + 1, we have 
27" < Li tm41e(Bn+1). Thus, 27” /Cn < LV u(Bn)u(Bn+1). The Cauchy- 
Schwarz inequality then shows that )>,,, 27" ./Cm < L as desired. 

Exercise 2.11.18 (a) First note that if «7 = )°,-y eZ, then N(T,d,¢) < 
Teen N (Tk, dk, €), and then use ~ for fx(€) = s/log N(Tk, dx, €). (b) 
Consider a set J with die! 6? < ee Ee = OQifk € I ande, = 0 
otherwise shows that V(e)? < Deere nee Thus, V(e) < S(e). (d) First note that 
Ye2 2 Oe < L i fx (€)de. Denoting by V(e) the quantity corresponding to V (€) 
for the family f,,¢, it then suffices to prove that V(e) < LV(e). Consider a family 
€x.¢ of numbers with Wee Ke < €?. Fork < N, define & by ra =>) eee so 


that )>,€ é? < €?. It suffices to prove that fi (&)? < L ve Tkael€k, ¢)?. Consider 
l= inf{l; fx,e(€x,e) A O} so that Slo) = = pms , and it suffices to prove that 
fx (EZ) < L2-*. But since €& > €x,e for each £, for £ < £, we have Skelex) = 0. 


Thus, f(x) < Dos feel) S Doe g2* SL. 

Exercise 2.13.3 In R”, the sum >, 2”/7e,(E) has to be replaced by eee 
where no is the smallest integers with 2”° > m (because as shown in Exercise 2.5.9 
(e) en (E) decreases very fast for larger values of 1), so no is of order log(m + 1). 
The result is then a consequence of the Cauchy-Schwarz inequality, ben <ny On S 
Jno + 1(So<ng b2)'/? for by = 2"/ an. This estimate is optimal in the case where 
by is independent of n. Thus, for ellipsoids, Dudley’s entropy integral is off by a 
factor at most ./log m for ellipsoids, compared with a factor log m for general sets. 

Exercise 2.15.2 For each n > 0 can find a set 7, such that card 7, < Ny and 
d(t, Tn) < 2en(T) for each t. Then (2.172) holds for B = 2°, _2”/*en(T). 
Given 6 > 0, consider the smallest m with e,(T) < 6. Then, as we have already 
seen, \ sim 2/7en(T) < LI(5) where 1(6) = i Jlog N(T, d, €)de. On the 
other hand, since e¢m—1(T) => 6 fore < 6, we have ,/log N(T,d,¢€) = gim—1)/2 
so that 2”/28 < LI(8). It then follows from (2.173) that with probability > 
1 — exp(—u?2"), we have SUPa(s,r)<5 |Xs — X1| < LuI(6), and in particular that 
E supays1)<3|Xs — X1| < LI (6). 
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Exercise 3.1.3 We write i. f(ejde = ent " f(e)du(e). Forn > 1, 


Ent+l 
we have 2” < f(e) < Qt! for Entl < € < € So that 2"(€, — €)41) < 


jae fedule) = 2G = Guay = 2", Also, f(@)-=< 2 fore > €; 80 that 


the previous upper bound remains true for n = 0 and i f(ejde < 2 ee, 2" En. 


For the lower bound, we observe that ei ent = eee 2"-le, so that 
pe 2" (En — Ent1) = 1/2) ot 2" En. 
Exercise 3.1.6 (a) It is obvious that if e* = )y..,e, then Ba(t,e) D 


Tleew Ba, (tk, €k) so that w(Ba(t,€)) > [leew Hx(Ba, (t, €x)), and consequently, 
log(1/u(Ba(t, €))) < ren log(1/tx (Ba, (te, €%))) so that the desired result is a 
consequence of (2.147) used for the functions f,(€) = /log(1/mx (Bx (te, €))). (b) 
should be obvious from (a) and the equivalence of (3.20) and y2(7, d) for any metric 
space. 

Exercise 3.4.2 Just by considering the term n = 1 of the sum, it is obvious 
that x2(T,d) => A(T), so it suffices to prove the following growth condition. 
Consider n > 1,m = N, a numbera > 0, points te € T with d(t, ty) = 
6a for € # 2! and sets He C B(te,a). Then x2(UcemHe,d) > a2”/*/L + 
ming x2(H,d). Given € > O, consider for € < m a probability measure pe 
on He such that for each admissible sequence (6;) of partitions of Hg we have 
(pes 2*/? A(Bu(t))due(t) > x2(He, d) — €. Consider the probability measure 
p=m! orem He. Our goal is to prove that for any admissible sequence (Ax) of 
partitions of H= Ue<mHe, we have 


; PAAR) du) > a2"?/L + min x2(He, d) ~€ . (F.1) 
T k>0 = 


For t € H, let €(t) be the unique integer € < m such that t € Hey). Fort € H, let 
us define f(t) = A(An_-1(t)) — A(An-1(t) N Hey) so that 


[Oana 2 2-? f peoauey +m YU 
T Lr 


k>0 e<m 


where Up = fp ps9 2*/7A(AR(t) N He)dee(t) = xX2(He, d) — € because if Af = 
{AN He; A € Ax}, then the sequence (A is an admissible sequence of partitions of 
A;. Thus, it suffices to prove that if f@dwu(t) => a/L.Let B = Uf{A € A,_-1; 3é < 
m, A C He}. Then w(B) < Na-1/m = Ny-1/Nn < 1/2 because B is the union of 
< Ny~1 sets each of measure < 1/m. Now fort ¢ B, the set A,(t) meets at least 
two different sets He so that its diameter is > 3a while An(t) N Heir) had diameter 
< A(Aer)) < 2a so that f(t) > a. 

Exercise 4.1.5 (a) Find a partition 6, with card B, < N, and A(B) < 3e,(T) 
for B € By. Define Ag = {T} and A, as the partition generated by A,_; and B,_1. 
Thus, the sequence (.A,) is admissible, and for A € A,, we have A(A) < A(T) 
forn = Oand A(A) < en_-1(T) forn > 1. Writing e_1(T) = A(T) for each t, we 


696 F_ Solutions of Selected Exercises 


have Wyxg(2"/*A(An()))? < Lyro 2" en—1(T))? < K Lys Q"/en(T))?. It 
is then straightforward that this sequence of partitions witnesses (4.9). (b) The key 
idea is that entropy bounds “are optimal when the space is homogeneous” so one 
should try to see what happens on such spaces. A good class of spaces are those of 
the form T = [];~,,, Ui where U; is a finite set of cardinality Nj. Given a decreasing 
sequence bj > 0, we define a distance on T as follows. For t = (ti)i<m, then 
d(s,t) = bj where j = min{i < m; 5; € t;}, and then one has by, < e,(T) < Dn-1 
from which one may estimate the right-hand side of (4.9). To get lower bounds on 
Ya,p(T, d), an efficient method is to use the uniform probability 4 on T and the 
appropriate version of (3.32). 

Exercise 4.1.9 Even though the space 7 shares with the unit interval the property 
that N(T, d, €) is about 1/e, there are far more 1-Lipschitz functions on this space 
than on [0, 1]. The claim (a) is obvious from the hint. To prove (b), for 2 > 0, 
let us denote by By the partition of T into the 2° sets of T determined by fixing 
the values of t1,...,t¢. These sets are exactly the balls of radius 2-*. Consider 
the class Fe of functions fe on T such that | fe(t)| = 2-€! for each t and that 
fe is constant on each set of By. Then each sum f = )°).9 fe where fe € Fe 
is 1-Lipschitz. Denoting by F the class of all such functions f, it is already true 
that y1,2(F, do) => Vk/L. It is not so difficult to prove that e,(F,d2) > 27~"/L, 
and to bound below y1,2(F, dz) itself, again the appropriate version of (3.32) is 
recommended, or an explicit construction of a tree as in Sect. 3.2. 

Exercise 4.2.2 Try the functional 


F(A) = 1 — inf{|lu|| ; u € A} (F.2) 
for A C T, and the growth condition 


F (Uesm He) = c*a? 2"? + min F(He) , 
<m 


when the sets H¢ are (a, r)-separated as in Definition 2.8.1. 

Exercise 4.3.1 In dimension d, an hypercube of side 2~* has an excess (or deficit) 
of points of about /N2-¢*, and you (heuristically) expect to have to move these 
points about 2~* to match them. Summing over the hypercubes of that side gives 
a contribution of /N2*4/2-)), For d = 3, it is the large values of k which matter 
(.e., the small cubes), while for d = 1, it is the small values of k. 

Exercise 4.3.4 For an optimal matching, we have >>; d(Xi,Yxu)) = 
d) f(Xi) — fa), and since f(X;) — f(¥ai) < d(Xi, Yaw) for each i, 
we have f (Xi) — f(Yn(i)) = d(Xi, Yaviy)- 

Exercise 4.5.2 For k < ng denote by M, the number of points X; which have 
not been matched after we perform k-th step of the process. Thus, there are My_1 — 
M, points which are matched in the k-step, and this are matched to points within 
distance L2‘-"0. Thus, the cost of the matching is < L orn 2k-n0 (My-1— Mx) < 
Ly Sha 2'-"0 My. At the k-th step of the process, we match points in a given square 
A of Hny—k+1. It should be obvious that the number of points of that square A which 
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are not matched after this step is completed equal the excess of the number of X; 
over the number of Y; in that square, that is, max(card{i < N, X; €¢ A} —card{i < 
N; Y; € A}, 0), so that 


My; = > max (card{i < N, X; € A} —card{i < N; Y; € A},0)). 
AEH ng —Kk41 


Let a = 2770+k-1 the length of the side of A so that the area of A is a”, and the 
expected number of points X; in A is Na?. Skipping some details about the Y;, the 
excess number of points X; in A is then of expected value < LVNa? = LaVN. 
As there are | / a* elements of Hno—k+1, the expected value of aM, is then < LVN, 
and since in the summation >’; _,,. there are about log N terms, the result follows. 

Exercise 4.5.8 (a) Let us denote by C the family of 27” little squares into which 
we divide [0, 1]. Fore = (Ec)cec € {-l, Ly°, consider the function f, such that 
for x € C € C, we have fe(x) = €-d(x, B) where B is the boundary of C. It 
should be pretty obvious that || fe — iPalp > 2-*" card{C € Cec F €(}. Then 
Exercise 2.5.11 (a) used for m = 27”, and k = m/4 proves that NC, d2,2-"/L) = 
2 HD. 

Exercise 4.5.14 (a) Making the change of variable u = t? in (2.6), we obtain 


[ee 


[o) 
Ey? = / pt? 'P(Y > t)dt < ay pt?! (exp(—t?/A) + exp(—t/B))dt . 
0 0 


One may calculate these integrals by integration by parts, but it is simpler to use 
bounds such as u?~! exp(—u*) < Lp?~)/? exp(—u?/2) for u = t/A for the first 
term and u?—! exp(—u) < (Lp)?~! exp(—u/2) for u = t/B for the second term. 
(b) We denote by A; (A) the diameter of the set A for d;. We consider an admissible 
sequence (By,)y>0 which satisfies 4.52 and an admissible sequence (C,),>09 which 
satisfies (4.53). We define the admissible sequence (A,) as in the proof of 
Theorem 4.5.13. It then follows from (a) that Dn(An(t)) < L(2”/?A2(Cn_-1(t)) + 
2” A;(Byn—1(t))) and (4.52) and (4.53) and summation imply (4.54). 

Exercise 4.5.17 Of course, (4.56) holds in any probability space, not only on 
[0, 1]?! Then y\(F, doo) < Llog N by Exercise 2.7.5. Also, én(F,d2) < Ny!” 
so that )> <9 2”/7en(F,d2) < oo. Then (4.56) shows that ES < LN where 
S = sup|card{i < N ; X; < t}— Nt| for a supremum taken over all t of the type 
k/M. Finally, supp-,<,;| card{i < N ; X; < t} — Nt| < 3S +1/N, using the fact 
that an interval of length < 1/N contains at most 2S + 1/N points X;. 

Exercise 4.5.20 Using Dudley’s bound in the form (2.38), in the right-hand side, 
there are about log N terms of order 1. 

Exercise 4.5.22 Denote by m the largest integer such that 27°” > 1/N. 
We find a subset T of C such that card T < Nm and such that for f € c. 
we have dx(f,T) < L273 < LN7'/3, We then have en(T, doo) 
L2-"3 for n < m and en(T,do) = 0 forn > m. Thus, y1(T, doo) 
Yh nl deol) 3 Dae, LO SS 1 SNP and tT a>) 


Le 


IA IA IA 
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YT doo) S Digey 2 ends) = Le 2” = LN" and then 


E supper | Dien (f(Xi) — f fdw)| < LN*/* by (4.56). 

Exercise 4.5.23 Given n > 1, consider the set C of 1-Lipschitz functions on 
T which are 0 at the point (0,0,...,) and which depend only on the first n 
coordinates. The estimate (4.59) used for a = 2 shows that log N(C, d2,€) < 
log N(C, doo, €) < L/e*, and thus, 71(C,doo) < L2"/? and y2(C,dz) < Ln. 
We may then appeal to Theorem 4.5.16 to obtain that E sup r¢c | Di<n (f (Xi) = 
{ fdp)| < L(nJN + 2”/7) and the result when n is chosen with 2” ~ N. 

Exercise 4.6.8 Consider the partition By of T obtained by fixing the values of 
ti,...,f¢ so that card By = 2°, and each set B in Bp_| contains two sets in Be. 
Consider the class Fy of functions f on T such that | f(t)| = 2~/? for each t and f 
is constant on each set B € Gy. Thus, as in the case of Exercise 4.1.9, the functions 
of the type }°,.9 fe are 1-Lipschitz. Assuming for simplicity N = N, for a certain 
n to obtain an evenly spread family (Y;), we simply put one point Y; in each set of 
Bon. For € < 2”, we construct a function fe € F¢ as follows. Recalling that each set 
B in Be_, contains two sets in Be, the function fe equals 2—£/2 on one of these sets 
and —2~°/2 on the other. Subject to this rule, we choose f? so that lien Se(Xi) 
is as large as possible. In each set, B € Bg_, are about 2-H1N points X;, which 
typically gives rise to fluctuations of order /2—*N between the two halves of B, 
and thus we expect ee fe(X;) to be of order 2~-/7./2-"N = 2-*/N and 
> ;<n Se(Xi) to be of order /N. The function f = o>, yn fe is a 1-Lipschitz 
function with }°;-y f (Xi) of order 2",/N (which is about log N /N) whereas 
Dien Ff) = 0. 

Exercise 4.7.4 See Exercise 2.3.3(c). 

Exercise 4.9.4 (a) Integration by parts as in the case of AKT. (b) (@p,c) pez 
are the Fourier coefficients of the function fc(x) = f hc(t) f (x, t)diim(t) so 
that)", lap.cl? = f fc(x)*dx. Now, fixing a point fo in C, we have fco(x) = 
Lac Ff, )dum(t) = fhe (f (1) — f(x, to))dum(t). We have | f (x,t) — 
f (x, to)| < 27” because f is 1-Lipschitz in t. Also, |hc(t)| < 2"/7, and the 
integration is restricted to C so that | fc(x)| < 27" x 2”/? x 27" = 273”/? and 
f fc(xy?dx < 27". The case C = G is easier. 

Exercise 4.9.5 (a) for each C € Cy, we have ae lap,cl < 27*”, Summing 
over C € Cy and then overn < m yields }),¢z )io<n<m 2-CeC, apa S 
m (recalling that cardC, = 2”), and the desired result follows when combining 
with (4.129). (b) The ellipsoid € given by Dae ac lap,cl’ < | satisfies y2(€) < 


L,/>- pC ae We compute )* ote arc by distinguishing three sets of indices and 
recalling that aC < Lmin(p~*, m2~*"), First, the set Jo consisting of the (p, ). 
Then the set J; of indices (p, C) with C € Cy,0 <n < m, and |p| < 2”/,/m, in 
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which case ie < Lm27", Finally, the set J) of indices (p, C) with C € Cy, 0 < 


n < m for which |p| > 2"/,/m, in which case a, < Lp~?. Then), 0,0 < L 
and 


De=-Dy ye 


n<m CEC, | p|=2"/./m 


ZL) > Vm aly) jms in, 
n<m CEC, n<m 
and we proceed in a similar way to prove that the other sum is also < Lm?/?. 

Exercise 4.9.6 Since this section is for the experts, this solution will be suitably 
concise. A function on U is a function of (x,t) € [0,1] x T. A first observation 
is that a function }°, 2g Fx is always 1/2-Lipschitz in the ¢ variable. Problems arise 
in the x variable, and the way this is solved is to set Zg+41,e,c = 0 whenever the 
little “square” [(€ — 1)2-4+), €2-GtD[xC is “dangerous”, that is, the function 
f= Does J 1s such that at one point of this “square”, we have |daf/dx| > 1/2. 
Matters are set up in a way that at most, 1/2 of the little squares are dangerous, 
exactly by the same type of arguments as we used for the AKT theorem. Next, 
defining Dec = Dien (fa+1,0,c (Xi) — f fo+i,e,cd0), the point is that |De.c| is 
often of order N2~47-?/2-24-P because ifh = fo+i.ec — f fg+1,e,cd9, then 
(f h?d0)'/? is about 2~7~? (the typical order of the value of h on the support of 
fa+i,¢,c) times ¥2~24-P (the square root of the measure of this support). Now, 
J/N274-P/2-24-P = .fN2724-3P/2, and since there are 277+? terms to be 
summed, this gives a total contribution of order J/N27P/? = JN / r!/4 as desired. 

Exercise 4.9.8 The magic relation is (Nn)? = = Nn+k- By hypothesis for 
n > O, there is a partition A, of T such that card.A, < N, and each set of A, has 
diameter L2~”. Consider a subset B, of 7, such that each element of 7,, is within 
distance 2~” of B,. We can take card B, = 2” forn < m and card B, = 2” for 
n > m. We classify the elements f of U by looking to which set of A, the value of 
each s € B, belongs. In this manner, we break U into (card An)od Bn — No, sets 
(or Nn+m sets forn > m). Since f is assumed to be 1-Lipschitz, the diameter of 
each such set in U is < L2~". This implies that e2,(U, D) < L2~” forn < m and 
€nt+m < L2~" for n > m, from which the last assertion readily follows. 

Exercise 5.2.2 (a) is a consequence of (5.2). (b) In that case, sup,e7 X; = 
<n |¥i| has expectation NE|Y;|. To show that y,(T, d) is of the same order, we 
use the bound yy (T, d) = 2°46, (T), According to Exercise 2.5.10, for 2” = N/L, 
én(T) is about the diameter NV YP of T. (c) The metric space (7, d) consists of N 
points within distance at most two of each other, and the left-hand side of (5.7) 
is of order (log N)!/4. However, E super Xr = Emaxj<y Y¥; is > N!/P/K. This 
is because from (5.3) for each i < N, there exists a set 2; of probability 1/N 
on which Y; > N!/P /K, where K depends on p only, so that since the sets 2; 
are independent, max;<y Y; is at least N'/?/K on a set of probability at least 
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1—(1—1/N)% > 1/L. Thus, Emaxj<y ¥;* > N'/?/K, and it is then a simple 
matter to conclude. 

Exercise 5.5.1 This is just as simple as it sounds. Considering an increasing 
sequence (8,,) of partitions with cardB, < My, the sequence (A,,) of partitions 
such that A,, = B, for 2” < m < 2"t! satisfies card Am < My = Non < Nm 
and Yinso A(Am(t)) < Li js02"A(Bn(t). Conversely, given an admissible 
sequence of partitions (A,,), the sequence B, given by By, = Az satisfies 
Yo 2”A(Bn(t)) < Doo A(Am(O))- 

Exercise 6.2.1 (a) 


E(x) = E sup f(X) => sup Ef(X) = sup f(EX) = (EX). 
fee fec fec 


(b) Use Jensen’s inequality for the function ®(y) = |x + y|?. (c) Use Jensen’s 
inequality conditionally on @ and X. 

Exercise 6.3.5 Write Y* = Y!/*Y3/?, and use the Cauchy-Schwarz inequality. 
Take Y = | )7;., #¢i|, and use that EY? < L||r||} by Khintchin’s inequality (6.3). 

Exercise 6.4.2 If t2; € {0, b}, then >>; |tei| < a?/b so that the left-hand side 
of (6.21) is bounded by a?/b. 

Exercise 6.6.3 An instructive example is given by T = {t),..., ty} where t, = 
(thi )ix1, tai = Oifi Anand t,,, = 1. Then y2(T) is about ./log N and 1 (T, doo) 
is about log NV. 

Exercise 7.1.5 Given t € T, the sets B and t + B are not disjoint because (BN 
(B+t)) = 2u(B)— w(BU(B+14)) = 2u(B)—1> 0, andifs ¢ Bands €t+B, 
thatis,s =t+uwithu € B,thent=s—ueB-—B. 

Exercise 7.1.8 This follows from Theorem 7.1.1 since 

inf {e> 03 p(Bg0,<)= 2" =N"} 


Pe ee 


Exercise 7.3.2 (a) Given w, the set Ay = {t € T;d,(0,t) < A(T, d.)/4} 
satisfies (Aw) < 1/2. Indeed, if this is not the case, then for each s € T, we 
have (s + Aw) N Aw 4 G, that is, T = A, — Aw. Then given a,b € T, we have 
a—b=s-—tfors,t € Aw and 


dw (a, b) = dy(a — b, 0) = do(s — t, 0) = do(s, t) < do(s, 0) + do (0, ft) . 
Since d,(s,0) < A(T, dw)/4 and dy (t, 0) < A(T, dy)/4, by the definition of A,, 


we obtain d,, (a, b) < A(T, dw)/2 which is absurd. (b) Setting By = {5; da(0, 5) = 
A(T, dw) /4}, then (By) = 1/2. Consequently 


feacr. de) Ase, d4(s) = 1/2)EAC(T, do) 
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and thus there exists s € T such that EA(T, dw) Use} = /2)EA(T, da). Now 
EA(T, dw) 1 sex.) < LEd,(s, 0) = Ld(s,0) < LA(T,d). 


Thus EA(T, dy) < LA(T, d) < Ly(T, d). 

Exercise 7.3.3 Choose, for example, T = 1,..., N. For a subset J of T, define 
the distance d; on T by d;(s,t) = Oifs,t ¢ J and d;(s, t) = 1 otherwise (s 1). 
If card] = k, then y2(T, d7) is about ./logk. On the other hand, the distance d 
on T such that d? is the average of the distances d; over all possible choices of [ 


with card I = k satisfies d(s,t)? = (75)/(2) = kk -— )/N(N - 1) SK/N for 
s,t € T so that yo(T, d) < L./k/N./log N. 

Exercise 7.3.8 Observe that E]|&, xiq || = E|&i,|. Use Jensen’s inequality for the 
lower bound and the triangle inequality El] }7; & xill < Ell&ig Xi ll FEW Diziy & Xi ll 
and (7.33) for the upper bound. 

Exercise 7.3.9 The Haar measure yz is normalized so that w(T) = 1. If |t — 
1] < 272-2", then d(1, t) < ny where n2 = LD, c min(1, |i|22-2""') and so that 
WL(BC, n)) = N,? and thus €, < 7,. It remains only to show that yeu 2n/2n,, < 
LY +09 2"/?W bn. For this, we write n2 < Ly en DENZ Na? + L Ds n bx 80 that 
Jim < LY pen VORNEN, | +L ps, be, from which the result follows. 

Exercise 7.3.10 (a) Use that A®*(u) = )°; ajxi(s)xi(u) and that the x; are 
orthonormal. (b) aj, = f A(s)du(s). (c) From (7.36). (d) With obvious notation, 
the hint implies that dag(s,t) < ||Allde(s,t) + ||Bllda(s, t) so that (4.55) and 
Exercise 2.7.4 imply that y2(T, dag) < L\|Blly2(T7, da) + L||Ally2(7, dg) and the 
result by (c). 

Exercise 7.3.11 (a) Note that x‘ = x(t)x, so that U(x)(@t) = U(y)‘(0) = 
U(x')(O) = x(t)U(X)() = uy x(t) where uy = U(x)(O). Thus, U(x) = uy x. 
(b) X; — X; = U(A)S — U(A)! = U(AS) — U(A') = U(AS — A’®) so that 
IXs — Xillve S WUllawllA* — Ala = [Ullyd(s,1). Then (2.60) states 
that f, sup, per |U(A)*(w) — U(A)(w)|dp(u) < LIU ll2,y272(T, 4), but the inte- 
grand is independent of u and equals sup, ,<7 |U(A)(s) — U(A)(t)|. Furthermore, 
y2(T,d) < LN(A). (c) 


ayo) = f UAr@IaucH) < sup |U(A)(s) — U(A)(@)] . 


s,teT 


Now, f U(A)(t)du(t) = ajyuiy. We have seen that |ajg) < LN (A) and |uj)| = 
UA) |Il2 < IU lla,y._ since ||1]]2 = 1. 

Exercise 7.4.1 The arch-typical example is when x; is the canonical basis of £°°, 
and the example boils down to E sup; <,, |e;| = 1 and E sup, -,, |gi| of order /logn, 
as follows from Exercise 2.3.7. 7 7 

Exercise 7.4.3 Consider the set T of sequences (t;)j<y with tf = +1 and 
card{i < N;t; = —l1} < /N. Consider the function xi on T given by xi(t) = tj. 
Then fort € T, we have | ¥°; e;xi(t) — 0; €i| = 2card{i < N; 4; = —-1} < 2/N 
so that Esup, | >cjey eixi(O| < 2/N. On the other hand, Wien Bixi) = 
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Mien 8 — 200 j-4,--1 8i 80 that E sup, ));<y gixi(t) = Esup; >);<; —gi where 
the supremum is taken over all the sets J of cardinality < /N. Consider a such that 
P(—g > a) = 1/N, where g is standard Gaussian. The events Aj; = {—g; < a} 
are independent, each of probability 1/./N, so that about N/./N = JN of them 
occurs. This shows that sup; )°;<; —gi is typically of order J/Na, and a is about 
Vlog N, so that E sup; 0;<; —gi is of order /N log N. 

Exercise 7.4.4 (a) is an application of Theorem 6.2.8 to the set T = 
{(aix(t))iz1}. We write ajx(t) = uj(t) + v;(¢) where sup; >); |vi()| < LS 
and E*,| 0; ui(tgil < LS. We then have a; xi(s + t) = ui(s +t) + uj(s +2) 
so that ajxi(t) = xi(—s)ui(s + t) + xi(—5)uj(s + ft). Averaging over s for 
the Haar measure du, we obtain ajxj(t) = uj(t) + v;(t) where uj(t) = 
SxiC—suils + Hdu(s) = f x(t — s)ui(s)du(s) = yi (t)a;(0). Similarly, 
vi(t) = yxi(t)v;(0) so that aj = uj(0) + v;(0). Moreover, >>; |v;(0)| < LS so 
that E sup, | 5°; givi(O)xi(@)| < E>); lgillvi(O)| < LS. Finally (easily) for each 
s € T, we have Esup, | >°; giui(t)xi(—s)| < LS so that Esup, | >°; giui(t + 
S)xXi(—s)| < LS and by averaging over s, E sup; | > giui(t)| < LS. Finally, since 
aj = u;(0) + 0; (0), we have E sup, | >>; giai(t)| < LS. 

Exercise 7.4.9 We will do only the easy part. Assume without loss of generality 
that the sequence (|q;|) is non-increasing. Then V, = {t € T; Vi < 2”,t; = l}isa 
neighborhood of the identity 1 of Haar measure 1/N,, and for t € V,, we have 


d(t, 1)? <4 0 Jail? <4 D> 2" azn? 


i>2" m=n 


so that én <2 Don 2/?|aam| and Yo 39 2" en < L Yoo 2™laom| < LY; lail. 

Exercise 7.5.3 Since E(\a(Z;(s)— Z;(t)|A1) < P(|Z;| 4 0), we have g;(s, t) < 
1 for all s, t € G, and the claim is obvious. 

Exercise 7.5.4 (a) Write the triangle inequality for the distance ./j and raise to 
the square, using that (a + b)? < 2(a? + b?). (b) It suffices to consider that case 
of D — D. We prove first that for s € D — D, we have g;(s,0) < 2d. For this, 
we write s = a — b where a,b € D so that using (a) and translation invariance 
gj(s,0) = gj(a — b,0) = gj (a,b) < 2(g;(a, 0) + gj (0, b)) < 2d. We then use 
(a) again to conclude. (c) By (7.67) forn > 1, the set Dn = {s € T; @;,(s,0) < 2"} 
satisfies u(D,) => 1/N,. According to Lemma 7.1.3, we can cover T by at most 
N, translates of D, — D,. According to Exercise 2.7.6, there exists an admissible 
sequence of partitions (A,,) such that for Ag = {7} and that forn > 1, each element 
A € Ais included in a translate of D,-1; — Dn—, so that by (b) for s,t € A, we 
have y;,_,(s,t) < 4-2-1. 

Exercise 7.5.7 When 9j(s,t) = >; |r7/ai(xi(s) — xi)? A 1 < 1, we have 
>; \r7/ aj (xi(s)—xi@)) \ < 1. We then integrate in s with respect to jz using (7.34). 

Exercise 7.5.12 Using symmetry and independence, we have E| >; ai9;| < 


LE,/>; age. Given u > 0, let §2, the event defined by |a;6;| < u for each 
i. Then P(2°) < >0,P(@i| > u/lail) < KS/u? where S = >; |aj|?. The 
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trick is then to compute Ea?671o,. Using (2.6) as in (7.88), one obtains that this 
is < Ku? |aj|?. Thus, E>’; a?6?1o, < KSu?-?, and Markov’s inequality 
proves that P(Q, 9 {> ao? > u*}) < KSu~?. Finally, we have proved that 


P(,/ 0; 470? > w) < KSu~?, from which follows that E,/ >; 476? < KS'/?. To 


prove that S < K A(T,dp)?, we consider the inequality }°; |ai|?|xi(s) — 1]? < 
A(T, dy)”, and we integrate in s € T, using that f Ixi(s) — 1|Pdu(s) > 1/K since 
f |xi(s) — 1|?du(s) = 2 and since 27-? |x|? > |x|? for |x| < 2. 

Exercise 7.6.5 This is a consequence of Lemma 7.6.4, setting T,, = {t € T;w € 
&,}, and using Lemma 7.6.1 to obtain the Eu (7,,) > c since p(t) = P(t € T,) = 
P(&,) > c foreach s. 

Exercise 7.8.5 (b). For s,t € U,s #t since s —t ¢ B, there exists i < N with 
Ixi(s — t) — 1| > @/2 so that |x;(s) — x;(1)| > a/2 and |x; (5s) — x; (S1)| > 2a 
by (a). Now for u € A, we have |x;(5s + u) — x;(5s5)| = |x;(@) — 1| < a. Thus, 
for u,v € A, we have x;(5s + u) € x;(St + v). (c) We can find U as in (b) with 
card U > u(A)/p(B) because for the largest possible U, the sets s + B fors € U 
cover A. On the other hand, since the sets s + A for s € U are disjoint, we have 
u(A) cardU < 1. Thus, (A)*/p(B) < 1. 

Exercise 7.8.17 This is Fubini’s theorem. Informally, we take expectation in the 
equality (Dn\ Bru) = Sp, Vis¢Bp jd (5). 

Exercise 7.8.20 Instead of integrating over all 7, integrate over Dp = {s € 
T; pjg(s,0) < 1} and use that fp, |Zi(s) — Zi()|' = |Z; (0)|? (as we have seen 
several times). 

Exercise 7.9.8 So for s € Dn, we have );-; |ail7|xi(s) — 1|? < €2, and by 
integration over D,,, we get )0j. U, |i Ps de. Assuming without loss of generality 
that |a;| > 0 fori € J, this shows that N, U, = @ and consequently that J = Un In. 
Using Theorem 7.8.1, this shows also that El] }0;<7, aigi xl < Lo" 6,. 

Exercise 7.9.9 The canonical distance d(s, t) = 7; —n |xi(S) — xi (t)|* satisfies 


Jp a(s, 0)?dy(s) = 2NU(D) — 2Re fin Dey xi(s)dn(s). By (7.125), we have 


/ Ie xi(s) dete) < LCVN (D) YToR@/ uD) - 

D i<n 

so that if log(2/u(D)) < N/(LC?), we have f,d(s,0)?du(s) > N and 
supp d(s,0)* > N. This prove that ({s; d(s,0) < /N)} < 2exp(—N/LC?), 
from which the result follows by (7.4). 

Exercise 7.10.2 (a) The series )>;., P(|Wi| = a) converges since a’P(\W;| > 
a)< E(w? A a’), and so does the series YS W; 11) w;|>a} because a.s. it has only 
finitely many nonzero terms. Thus, it suffices to prove the convergence of the series 
Yo is1 Wilqw;i<a}, but symmetry and (7.197) imply that this series converges in 
L? (using Cauchy’s criterion) and hence in probability. The conclusion then follows 
from Lemma 7.10.1. (b) If the series }°; W; converges a.s. it has finitely many terms 
which are > | so the series 7; Wilyw;,j<1; also converges, and we may assume 
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|Wi| < 1 without loss of generality. Consider a finite set J of indices and X = 
Oe W;)°. Then EX = Der Ew? and EX? = Day itier EW;, Wi, Wi; Wi, = 
2Vijer Ew; Ew; + Yj<, EW; and since EW; < EW?, we obtain that when 
EX > 1, then EX’ < 3(EX)* so that by the Paley-Zygmund inequality (6.15), we 
obtain P(X > EX/2) > 1/L, and this should make the result obvious. 

Exercise 7.10.3 If P(f > 5) < 1 —6, then f(, | f|dP < 5 where 2 = {|f| < 5} 
satisfies P(2) > 1 — 6. Conversely, if /, | f|dP = @ is small, then by Markov’s 
inequality, P(2 NM {|f| = Ja}) < Va so that P(|f| > Ja) < Ja + P(Q*). 
Finally, if E| f|? = @ is small, then P(| f| > a!/@?) < Ja. 

Exercise 7.12.3 Here in (7.38), one has c; = |a;|”, so the condition 


rues la;|2)"/2 < 00 


n>0 i=>Ny 


is identical to the condition }°,,.9 27/7, <0, 

Exercise 8.2.2 You prove as usual that for |A| < 1/2, we have EexpAY < 
exp Li? and using (2.6) that for |A| > 1/2 you have Eexp|AY| < exp(K|A|?) 
where 1/p + 1/q = 1 so that EexpaY < exp(K max(A’, |A|2)) for p < 2 and 
EexpaY < exp(K min(A, |A|?)) for p > 2. The rest is straightforward. 

Exercise 8.2.10 We prove first that if t € B(u), then the sequence t/ = (t/);>1 
where t/ = t;1,;;|<4) satisfies ||¢’|| < 4,/u. Let us assume for contradiction that 
S:= Ys, (t)? = 16u. Then a; := t/./u/S satisfies |a;| < 1 so that U(a;) = a? 
and). a? = u. This, by definition of B(u), we have )°;. , ait; < u. However, 
this is impossible because )>;.., ait} = /Su > 4u. 

Next, consider the sequence t” = (t’);>; where t/ = ;14\;,\~4). Our goal is 
to prove that 7; |#/|4 < 274~!w. Assuming that this is not the case, we may 
by decreasing some ¢; if necessary assume that S := )>;., |t/|7 = 274~!w. Let 
aj = clt/|4/? where c = (u/(2S))!/? = 4-4/4? 50 that |aj| > 1 if a; A 0. 
Thus, U (a;) < 2aj|? and }°,., U(a;) < 2751 lai]? = 2Sc? = u. But then, 
Esitech jek = G2? Se. 

Exercise 8.3.6 Assume without loss of generality that N = N, for a certain 
integer t. For each integer p, consider the set Hy of sequences t = (f;) with the 
following properties: Each 0 < ¢; < 1 is a multiple of 1/N, at most p coordinates 
t; are not 0, and the corresponding i are < N. Then, since there are at most N? 
ways to choose the indices i where #; # 0, by trivial bounds, one has card Hy < 
N2? = ptt Let us define T, = {0} ifn < t and T, = Hp where p(n) = 
2"-t—! otherwise. Thus, card T, < N,. Fix t € conv 7, and assume without loss of 
generality that the sequence (t;) is non-increasing. Now )>, <n<2r+1 tp(n) (p(n) — 
p(n — 1)) < ot; = 1 so that pa re oem < 2. Obviously, d(t, Tn) < 
tpn) + 1/N so that we have shown that eee re 2"d(t, Tn) < L2° = LiogN. 
Now )0,<, 2"d(t, Tn) < L2* because d(t, TJ) < 1 and )¢ ,2"d(t, Tr) can 
be bounded by the usual dimensionality arguments. 


n>2t+ 
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Exercise 8.3.7 First observe that 7) (T, doo) is of order log k. On the other hand, 
if x; denotes the i-th coordinate function, || 0; @iXilloo = 0; |ai|, from which 
it follows (by considering the case where a; € {0,2/k} and }°;a; = 1) that 
log N(conv T, doo, 1/4) is of order k, so that yj (conv T, doo) is at least of order 
k. Note that in that case, y2(T, d2) is of order 2*/?,/logk. 

Exercise 9.2.3 For n > 0,5,t € A € An, we have f’|2/4)(s(w) — t(w))|? A 
Idv() < u2", and since Y 9 2"2-@A << 27) 2"r-), Theorem 9.2.1 
indeed follows from the special case r = 2. 

Exercise 9.4.5 Obviously, y; is translation invariant, yj (t-+s, t/+s) = yj(¢, 0’). 
And since it is the square of a distance, it satisfies the inequality wj(t,t’”) < 
2(wj(t.t’) + w(t’, t”)). As simple consequence is that for a set C = {s € 
T ; wj(s,0) < u} and any s ¢€ T, then for t,t' € s+C —C, we have 
wit, t')< Ou. By Lemma 7.1.3, T can be covered by < N,, translates s + Cy, — Cy 
of C, — Cy. So for each n, we have a partition B, of T in N, appropriately small 
sets, and we construct the required admissible sequence of partitions in the usual 
manner, A, being the partition generated by A,_, and B,_. 

Exercise 9.4.6 The idea is to apply Theorem 9.4.1 using the sequence of 
partitions built in the previous exercise. A problem is that the sequence of partitions 
lives on T, whereas Theorem 9.4.1 applies to sets of sequences, so some translation 
is necessary. For this, it helps to write our finite sums )°, as infinite sums )°,. , with 
the understanding that the terms of the sum are eventually zero. We may assume that 
the sequence (j,),>0 is non-decreasing, simply by replacing jn, by maxy<y jx. For 
s € T andi > 1, let us define 6;(s) = a; (xi (s) — xi(O)) and 0(s) = (6;(s))is1 € £2. 
Thus, 


sup | }> eiai(xi(s) — xi (0))| = sup | ¥ > e/6i(8)| = sup | > e7x5| 


seT i>] seT i>1 xe0(T) i>] 


and 0; \r/ (0;(s) — 9; (t))|? A1=w;,(s, t). We transport the admissible sequence 
(An) constructed in Exercise 7.223 to T* := 6(T) to obtain an admissible sequence 
(A*). For A € A*, we set j,(A) = j/. It is then a simple matter to deduce (9.54) 
from the application of Theorem 9.4.1 to the set 7*, using also that for x = 
(ai (xi(s) — xi(0)))iz1 € T*, we have |x;| < 2|a;| so that ));.; |xi|Lp),,)3,-Joy S 
2 Viet ail 4)q;)>r-i0}- 

Exercise 10.3.5 Let us fix x > y and write ag = @ce,c(e+1)(X) — Pee c(e+1)()). 
Thus, 0 < ag < c and )\, ae = x — y so that a, <c oy ae = c(x — y) and 
Yea? < (Dy ae)” = (x — y)’, proving (10.39). Next, let k be the number of ay 
which are # 0. If k = 2, then (>, ae)” < 2°, a7. This is particularly the case 
if x < y +c. Furthermore, clx — y| = )°, cae < kc? and Y,az > (k — 2)c? 
because all the ag which are not zero but 2 are equal to c. Since k < 3(k — 2) for 
k > 2 (10.38) follows. 

Exercise 10.3.7 If b < 1, itis not possible to cover B, by finitely many translates 
of the set bB2 because given such a translate A, for n large enough, the basis unit 
vector é, does not belong to A. Also, € Bz + aBy C (€ + a) Bo. 
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Exercise 10.3.8 According to Theorem 6.2.8, we can write T C T; + T> with 
y2(T,) < Lb(T) and T, C Lb(T)B,. Then N(T, + To,€Bo + Lb(T)Bi) < 
N(T,, € Bz) and we use Exercise 2.7.8 (c) to bound the last term. 

Exercise 10.14.6 The main point is the inequality g;(s,t) < r2J d(s, t)?, where 
d is the £2 distance. Given an admissible sequence (A,,) of partitions, for A € An, 
define j,(A) as the largest integer j € Z such that r/A(A)* < 2”/?, so that 
r—WAY < 72/2 D(A) and D592?r- AO) < ry 9 27 A(An(t)) while 
2 j,(A)(S, t) < 2” fors,t € A € Ay. : 

Exercise 10.15.3 Given the (€;)j;<7, we construct recursively a sequence o1,..., 
On,... With oy, € {0, 1} as follows. Assuming that 01, ..., 0, have been constructed, 
leti = (O1,..-0n) € In, i9 = (O1,.--On,0) € Ing, andi, = (01,...0n, 1) € 
In41. We chose on4+; = 0 if e;, = 1, and otherwise, we take on4,; = 1. It is 
straightforward to show by induction that E )),<, icy, €iti = Lip<n %K/2 where 
fori € Ip, we have t; = ay if i = (oj,..., og) and t; = O otherwise. 

Exercise 11.6.2 The argument is given at the beginning of the proof of 
Theorem 11.7.1. 

Exercise 11.12.2 (a) Since 6(T) = d(convT) and by Theorem 11.12.1. (b) 
Taking A = y1(T, doo) the inequality y2(T,d2) < A/V6 is satisfied for 5 small 
enough, but y; (conv 7, doo) may be much larger than 71 (T, doo). 

Exercise 11.12.4 The use of (11.56) gives a bound of — log P(}0;_; 5; => u) of 
order min(u2 /6 card I,u) which can match the bound (11.70) only when u is not 
much larger than 6 card /. 

Exercise 12.1.2 Given M > k, the probability of finding k points in A is 
pm := (2)P(A)*(L — P(A)™~*, so the probability that card(A 9 IT) = k is 
nek PCM = m)pm which you compute to be exp(—P(A))P(A)*/k!. As for the 
property of Lemma 12.1.1, it is simply because given M and I = {i < M,Y; € A}, 
the points (Y;);-7 are uniform 1.i.d. in A, distributed according to the probability of 
this lemma. When £2 is o-finite but not finite, you break {2 into a countable disjoint 
union of sets §2,, of finite measure, define independent random sets JT, in each of 
them according to the previous procedure, set JT = U;,,IT,, and check that this works 
by similar arguments. 

Exercise 12.3.6 For the Lévy measure v of a p-stable process, it never happens 
that i |B(t)| A 1dv(B) < oo for all tf € T unless this Lévy measure is concentrated 
at zero, because for a # 0, the integral / |jax—!/P| A 1dx is divergent. In fact, it is not 
difficult to show that when fi |B(t)| A ldv(B) = oo if the sequence Z; is generated 
by a Poisson point process of intensity v, then )°; |Z;(1)| = co a.s. 

Exercise 12.3.14 Really straightforward from the hint. 

Exercise 12.3.15 It is better to use the functional y*(T, d) of (5.21), which in 
the homogeneous case is just ee, en(T) and the result by (7.5). 

Exercise 12.3.16 By hypothesis, v is the image of the measure 4 @m on Rt x C 
under the map (x, 8) +> xf where m is supported by G, and yw has density x~?~! 
with respect to Lebesgue’s measure. The result follows since 6(0) = 1 for B € G. 
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Exercise 12.3.17 Combining Exercise 12.3.14 with Theorem 12.3.13, the only 
point which is not obvious is that for p > 1, we have E sup,¢7 |X1| < oo. Reducing 
to the case where the Lévy measure is the measure v! of Theorem 12.3.11, this 
follows from the fact that E|X;| < E>°; |Z;(0)| = f \BO)|dv'(B) < oo by 
Exercise 12.3.16. 

Exercise 12.3.18 As in the previous exercise, we reduce to the case where 
the Lévy measure is the measure v! of Theorem 12.3.11, and we have to prove 
an estimate P()°;|Z;(0)| => u) < C/u. Consider the probability Py, the 
conditional probability given the event that there are N numbers Z;, we bound 
crudely Py(>>; |Z;(0)| > u) < NPwGi, Z; > u/N) < CN? /u, using both 
Exercises 12.1.2 and 12.3.16, and the result follows by summation over NV. 

Exercise 13.2.2 Consider the subset A’ of A defined as follows: A sequence 
(5;)i<m is in A’ if and only if 6; = 0 when i does not belong to any one of the sets 
I; and eh 6; = 1 for each € < r. The set A’ obviously identifies with gee Ie. 
Consider the uniform probability measure v on A’. Consider a set J C {1,..., M}. 
Then A’/N HA; = Ounless J C Uge,Je and card IN I¢ < 1 foreach € < r, and in that 
case, v(H;) = k~ "4! (as belonging to H; amounts to fixing card J coordinates in 
the product [],—, Jc). Thus, v(H7) < k—card! for each I. Hence if A C UregHr, 
then peg hk °O™" = 1. 

Exercise 13.4.2 The class 7 consisting of M disjoint sets of cardinality k satisfies 
6(7) < k. Fixing 6 andk, Ss(.7) is such that M(6k/S; (J)) BP < 1 so that it goes 
to co as M > ov. 

Exercise 13.4.3 Consider integers 2 << N2 <« N, and a number 6 > O such 
that S := 46N, = 2. Consider the class Zo of sets consisting of one single set D 
of cardinality Nj and of M disjoint sets (B;)i<m, each of cardinality N2, where M 
is the largest integer with M(N2/N1)° < 1/2 (so that M > (N1/N2)*/4). It is 
straightforward that Ss(Zo) < S. Consider the class Z consisting of sets which are 
union of two sets of Zp. Then 6(Z) < 26(Zo) < LS. Consider now a class 7 of 
sets such that Z Cc J(1,m). The goal is to prove that given A > 0, for suitable 
choices of N; and N2, we have S3(7) +m > AS. Assume for contradiction that 
S3(J) +m < AS so that m < AS and 


(yrs i; (F3) 
JEF 


In particular, card J < AS/5 = 4AN, for J € J so that 


card{i < M; JOB; 40} <4AN,. (F.4) 
Since J Cc J(1,m), given I é€ ZT, there exists J € J with card] \ J < m. 
In particular, given i < M, there exists Jj € J with card((D U B;) \ Jj) < m. 


Assume now Nz > AS > m. When card((D U B;) \ Jj) < m, since card Bj; = No, 
we must have J; 1 B; # %. Combining with (F.3) shows that there are at least 
M/(4AN)) different sets J;. For each of these sets Jj, we have card(D \ J;) < m so 
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that card J; > N,/2, and since S$ = 46, this implies 6 card Jj /(AS) > 1/8A. The 


sum (F.4) is then at least 


me Gone) a 
4AN, \8A ~ 16AN, \N2 8A , 
and since S = 2, this cannot be < | when N; > N32. In conclusion, given a number 
A, we choose Nz > 2A, N; large enough, and then 6 so that S = 46N2 = 2, and 
the previous construction provides a class Z such that S(.7) +m > AS whenever 
ZIcJ(U,m). 

Exercise 14.1.3 Defining b,(F) = inffe > 0, N, \(F, €) < Nn}, the right-hand 
side of (14.8) is at least of order pee 2”/2b,(F), by the same argument as was 
used to prove (2.40). One then proceeds exactly as in Exercise 2.7.6 to prove that 
the previous quantity dominates the quantity (14.6). 

Exercise 14.2.1 (a) fexp(|f|/Adu < (fexp|fldu)!/* < 2. (b) should be 
obvious to you at this stage. (c) Since for x > 0 we have expx < exp(1/4) expx? < 
2 expx*, when ll flu. < 1, we have fexp(|fldu < 4 and (14.16) by (14.15). To 
prove (14.17), assume || fi|ly. < 1, lfally < 2, and use that | fi fol < sy + 
ay and the Cauchy-Schwarz inequality. (d) It is elementary that / exp(g?/A*) = 
A/V A2 — 2 for 0 < A < 2, so this norm is the positive solution of the equation 
A* = 4(A? — 2), that is, A = ./873. (ec) Combine the subgaussian inequality and 
(b). For the rest, see the next exercise. 

Exercise 14.2.3 The reader should review the proof of Theorem 6.7.2. We 
consider the same chaining as in the proof a Theorem 14.2.2. Along the chains, 
for we decompose each function, mf) — mn,(f) as fn, + fn2 where fri = 
(tmn(f) - Ten PN fp), (PI<2-" 2 NA(An 10 f),¥2)" We define Fj, as the set of 
sums Dian Jn.1- By the method of Theorem 14.2.2, we then have y2(F1, 2) < 
Lyo(F, 2) and 11(Fi,doo) < VNy(F, wW2). Let Fh, be the sets of sums 
ies Jn.2- We will show that E sup peri Dien |f(X))|. This will finish the 
proof since the chaining beyond n; involves no cancellations as is shown in the proof 
of Theorem 6.7.2. This follows from Lemma 14.2.4 and the following observation: 
If [fly < Landa > 1, then || fIjpj>ally, < L/a. Thus, POO j<y |fn(Xd)I < 


u2"/2,./N A(An-1(f), W2)) < exp(—LNu), etc. 
Exercise 14.2.10 As always, we start with the relation exp x < 1+x +x? exp |x| 
which is obvious on power series expansions. Thus, using Hélder’s inequality, we 


get 
Eexp(AY) < A°EY7 exp |AY| < A7(EY®)!/3 (Eexp 3|AY|/2)7 . 


It then remains to prove that EY® < LA® and Eexp3|AY|/2 < Lexpa?A? which 
follows from a routine use of (2.6). 
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Exercise 15.1.9 (a) Consider an admissible sequence (A,,) of partitions of T such 
that 


weT, > 2" A(An(t), do) < 272(T, do) - (F5) 


n>0 


Forn > 1 set T/ =U{AND; A € An; D € Dy, W(AN D) < 2N7,,)}. Since 


ys Ae 2 _. —| _ =2 
T, is the union of < Nz = Ny+1 sets of measure < 2N42 = 2Nnuy 


w(T,) < 2N,,),. Also, fort € T, = T \ Ty, we have 1(An(t)  Da(t)) = 2N7, 
so that Nn (t) < A(An(t), dy). Thus, 


we have 


f DP nat) 22 NAT da) [ 2"? A(An(t), do)du(t) « 


Summing over n > 0 and since A(T, dy) < Ly2(T, d) we obtain the result, using 
also (E.5). (b) It suffices to assume that €,(f) > K2"/?A(D,(t), d1). By definition 
of €n(t), 


i({s € Dn(t) ; d(s,t) < On(t)/2}) < ws; d(s, t) < On(t)/2}) < Noy - 


If s € D,(t) and d(s,t) > €,(t)/2, then d(s,t) > K,2"/7A(Dy(t), d1)/2 > 
K,2"/d,(s, t)/2. Thus, the right-hand side of (15.23) is < exp(—2”) for a suitable 
choice of K,. As usual, this implies that with large probability, we have 


w({s € Dn(t); d(s, t) < w6n(t)/2)}) < a! exp(—2"), 


and then by Fubini theorem that with large probability w({s € Dnt) < 
aén(t)/2}) < aN so that no (t) > a6, (t)/2 and (15.28) is proved. Combining 
the preceding, we have 


/ > 2"/7€,(t)du(t) < K sup 2” A(Dn(t), d1) + KEy2(T, dw) - 


n>0 teT n>0 


By an appropriate choice of the D,, we obtain je baer, 2" e,(t)du(t) < 
Ky\(T, d\) + KEy2(T,d.) and consequently fp I,(@)du(t) < Kyi(T, di) + 
KEy2(T, dy). The result follows since yz is arbitrary. 

Exercise 15.1.14 First, we show that ¢A(T,d) < KM. Indeed, for s,t € T, 
by (15.22), we have P(d,,(s, t) => ad(s,t)) => @ so that according to (15.24), with 
positive probability, we have at the same time d,,(s, t) > ad(s,t) and y2(T, dw) < 
M, and since d,(s,t) < Ly2(T,d.), we have d(s,t) < KM. Thus, A(T,d) < 
K,M. Consequently, N(T,d,¢€) = 1 fore > KM and thus €,/log N(T, d, €) = 0. 

Consider 0 < € < K,M and assume that we can find points f),...,ty of 
T with d(tj,tj) = € > Ofori # j. Then by (15.23) fori # j, we have 


P(dy (ti, tj) < we) < (1/a) exp(—ae?/A(T, d})). Fore > KJ/log NA(T, d), 
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the right-hand side is < a/2N*. According to (15.24), with positive probability, 
we have at the same time d,,(t;, tj) => ae for alli A j and y2(T,d.) < M. Since 
€ea./log N < Ly2(T, d,), this proves that €./log N < K M. That is, we have proved 
that 


Vlog N < €/(K2 A(T, d1)) > JlogN < K2M/e , (F.6) 


and without loss of generality, we may assume that K2 > K,. Let us assume now 
that €/(K2 A(T, d\)) > 2K2M/e (or, equivalently, ¢ > K2./2A(T, d|)M). Observe 
that then since € < K,M, we have 2K2M/e > 2. Let us prove that we must have 
Jlog N < 2K2M/c. Otherwise, we may replace N by the largest integer N’ for 
which ,/log N’ < 2K2M/e. Then 2K2M/e < ,/log(N’+ 1) < 2,/log N’, and 
thus, we obtain the relations K2M/e < ,/logN’ < 2K2M/e < €/(K2A(T, d))) 
which contradict (F.6). Thus, log N(T, d, €) < KM/e which concludes the proof. 
The application to Proposition 15.1.13 is straightforward as (15.24) holds for M = 
LS(T). 

Exercise 15.1.17 Indeed (15.44) implies log N(T, doo, @) < LS(T)*/a?. Now, 
if B is a ball By(t,a) of T fora = e*/L'S(T), since A(Boo(t, @), doo) < 2a, 
for L’ large enough, the right-hand side of (15.40) holds and this inequality 
implies log N(B,d2,€) < LS(T)*/e*. Since N(T,d2,€) < N(T,doo,@) x 


max;eT N(Boo(t, a), dz, €), combining these yields 


4 2 
log N(T, do, €) < L(+ ) 
€ € 
Thus, N(T,d2,€) = | fore > LS(T). Fore < LS(T), the term Scryt je 
dominates, and this implies (15.43). 

Exercise 15.2.2 To each tensor A, we associate the rv. X4 given by (15.55). 
We deduce from (15.69) that ||X4|lp < K(d) oy <p<g PE ||Allag where || All“) = 
card Pk I|Allp. To turn this inequality in a tail estimate, we use that rP(|X4| > 
t)'/P <||X4llp so that 


1 
logP(Xal = 0) < plog(= > p*"1Allaw) 
1<k<d 


We then take p = (1/K (d)) miny<x<a(t/||Allq)?/* to obtain a bound 


1 t 2/k 
P(|X4l>t)< Kd) exp (- K(d) nin, (Tapa) ) 


Considering a set T of tensors A and d; the distance on T induced by the norm 
ll - la, we then have the bound E supyer X4 < K(d) veieped y2/K(L, dk). 

Exercise 15.2.8 Since A is symmetric, we have a(y) = sup{(x, y);x € A} so 
that Ea(G) = Esup,<4(x, G) = g(A) by definition of that quantity. 
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Exercise 15.2.16 We have ||(A, G')ll(2,3) = (0 j,x (os 41,3,48)))'/” so that by 
the Cauchy-Schwarz inequality, E||(A, G!) ll2,3; < ink a? , i? = ||All,2,3). 

Exercise 16.2.2 It is obvious that N(T, d,27"/*) < M, := card{l €Z,; 1M 
T 4 6}. On the other hand, one has M, < 2N(T,d, 2-"/2) Indeed, if the intervals 
I € Z, which meet T are numbered as /), 2... from left to right, then any points 
in 1), 13, Is,... are at mutual distances > 2~”/2. We have already proved (d) as a 
consequence of 16.5. To prove (c), one simply uses the Cauchy-Schwarz inequality 
to obtain )o pez, V2-"HU) S V2" card{I eT, ; INT FG}. 

Exercise 16.3.4 We observe that the function x(log(2/x))* increases for x 
small. Distinguishing whether a, > 1/n? or not, we get ap(log(2/an))? < 
La, (log ny? + Ln-2 (log n)’, and by summation, this proves that (16.16) 
implies (16.21)! Assuming now that the sequence (a,) decreases, (16.21) implies 
that )>, 24a. (log(2/ayx))?_ < 00 so that ay. < C27* and for k large enough 
log(2/ayx) > k/L and thus )>, 2k ans (log k)* < 00 which implies (16.16). 

Exercise 16.3.5 The recursion formula is simply 


1 
+ max M(TN/;). 


J 2" unr (Ty) 


Exercise 16.5.3 Take €, = 27” A(T, d) and T with card JT, = N(T,d, €,) and 
d(t, T,) < € fort € T. 

Exercise 16.8.17 Let us follow the hint. Then given s, t, we have |X, — X;| = 
ge '(N /(2€)) on the set 22, U 2; of probability 2¢/N and zero elsewhere, and 
thus, Eg(X, — X;) < 1. The rv. sup, ,|Xs — X;| equals go '(N/(2e)) on the 
set ),<7 & of probability €« and zero elsewhere so that for A > 0, we have 
| sup, ,|Xs — Xillly < A> ew(y—!(N/(2€))/A) < 1. Since in our case the 
integral in the right-hand side of (16.177) is y~!(N), this inequality implies that for 
all choices of N and €, one must have ev(g ! (N/(2e))/(Lo7! (N))) < 1. Setting 
x= Lo~!(N) and y = yg !(N/(2e))/x and eliminating € and WN yields the relation 
g(x/L)w(y) < 2¢(xy) which is basically (16.170). 

Exercise 16.9.2 It follows from the Cauchy-Schwarz inequality that for 
any probability 4, we have | < fi) u(B(t,))dt fy dt/u(B(t, €)). Now, by 
Fubini theorem, denoting by 4 Lebesgue’s measure on [0, 1], i U(B(t,€))dt = 
{dulu)a(B(t, €)) < 2e, and thus i dt/u(B(t,€)) > 1/(2e). Integrating in € 
yields the result. 

Exercise 16.9.3 For k > 1 andt e€ T, define the function Y;; by Yer(s) = 1 
if d(s,t) < 2-* and Yx,1(s) = O otherwise. Consider Y;; as a r.v. on the basic 
probability space (7, jz) where y is the uniform measure on 7. Consider the r.v. 
xX; = re Y;,1. It should be obvious that sup, X; = oo. On the other hand, 


M(T) = 


' This is also a consequence of Corollaries 16.3.1 and 16.3.2, but the direct proof is much clearer. 
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consider s,t € JT with d(s,t) = 2-*. Then Yer = Xx fork < €. Thus, 
E|Xs — Xs] < Vase EMes + Yar) = Dang 2 tt = 2-1! = 2865, 0). 

Exercise 16.9.4 Simply replace the inequality (16.129) by the simpler |X; — 
Xo, (sy < d(s, On (s))([Xs — X6(s)|/d(s, 6n(s))), and follow the same proof. 

Exercise 18.1.4 According to (18.11) for each u > 1, the convex set C(u) := 
{wy < u} is invariant by the symmetries (x1, x2, x3) FH (4x1, £x2, £3). Consider 
the smallest box B = [—a,a] x [—b, b] x [—c, c] containing C. Then the points 
(ta, 0,0), (0, +b, 0), (0,0, +c) belong to C(w), and thus B/3 C C(u). Given 
k > O, let us define (m ; (k)) ;<3 as the smallest integers such that C(N;) C B(k) := 
f—271&) 71] o¢ Fagmatk) gm2()] > 23) 23(1)]_ Observe the fundamental 
fact that B(k)/6 C C(N;). According to (18.10), we have 2" O+m2(h)+m3(+3 > 
log Nx > 2*-! so that mi(k) + mo(k) + m3(k) > k — 4. Also, mj(k) > 0 
from (18.12). It should be clear that we can find sequences (nj;(k))j<3 such 
that nj(k) < mj(k + 4) which satisfy (18.2). Observe then that S; C Br+a. 
Recalling (18.3), we consider the function g given by (18.5), and we proceed to 
prove that w(x) < g(Lx). Since g > 1, this is always the case when w(x) < I. 
It follows from (18.12) that the set {y < 1} contains the set D = [—1/3, 1/3}7 so 
that if L is large enough S5/L C D so that w(x) > 1 > g(x) = Ns, and it suffices 
to consider the case w(x) > Ns. Consider then the largest k such that w(x) > Nx, 
sok > 5. Then Sx—4/6 C Bg/6 C CCN) and then Sp41/L C C(Nx). Since 
x ¢ C(Nx), this proves that g(x/L) > Nx+1 so that g(x/L) > W(x) by definition 
of k. 

Exercise 19.2.6 Small variation on the proof of Lemma 19.2.5. 

Exercise 19.2.7 So there exists a set U, C €? with cardU, < 5°, |lul| < 
2a, and Boy, a,) C convU,. Let U = Uno U;,, so that given a > 0, we have 
Na := card{u € U ; |lul]| > a} < {54% ; 2a, > a}. Now for 2a, > a, 
we have n + 1 < exp(4/a7), and since 54!" < (n + 1)!°8°, we have Ny < 
exp(L/a). Thus, if we enumerate U as a sequence (ux) such that the sequence 
(||uxl|)k>1 is non-increasing, we have ||ux|| < L/,/log(k + 1). On the other hand, 
SUP,>1 Gn (Mies, git S sup, iri Uk i 8i- 

Exercise 19.2.8 We use the following form of (2.61): P(sup, ,¢7, |Xs — Xr] = 
L(g(Th) + ubn)) < L exp(—u°), and by the union bound as usual 


P(¥n >1, sup |X; — X;| => Lu(g2(T,) + bn logn)) < Lexp(—u7) 


S,teTn 


and hence E supns.teT;, |X; — X;| < Lsup, L(g(Th) + bnJ/log(n + 1)) which 
implies (19.61). We then apply this inequality to the case T, = BoUn, dn). 

Exercise 19.2.16 Consider a set J, D I, with log(n + 1) < card J, < 2log(n + 
1). Then 


1 2\” : » 2" 
|x S LS sup (= =i >) as sue (Sar, ‘) 


21 i€In n21 ieJ, 
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Exercise 19.2.18 The set T of (19.83) satisfies T C LB, S by a simple adaptation 
of Lemma 19.2.17, and it satisfies y1(T, do) < LS by Theorem 8.3.3. The result 
follows from Theorem 19.2.10 as in the proof of Theorem 8.3.3. 

Exercise 19.3.6 The result of Proposition 2, (ii) of [24] states (changing é into 
é/5 and taking 6 = 2e) that 


N(Xj, doo, €) S N(X7, 500, 26)N(W, |] - |], €/K) . 


Using (19.123) to bound the last term, we obtain 
“ - S\P 
log N(X}, doo, #) < log N(X}, 800, 26) + K(p, m)(=)" log N’, 
€ 


through which the desired result holds using iteration. 

Exercise A.1.2 When there is a matching z between the points X; and evenly 
spread points Y; with d(X;, Yr(i)) < A, then for any set C, we have card{i < 
N; Xi € C} < card{i < N,d(¥;,C) < A}. Any point Y; with d(Y¥j,C) < A 
is such that every point of the corresponding little rectangle is within distance < 
A+ L//N of A. When C is the interior of a curve of length < 1, the set of points 
within distance € of C has area < A(C) + Le. For A > 1//N, this proves that 
card{i < N;d(¥i,C) < A} < NA(C) + LNA. This provides an upper bound 
card{i < N; X; € C} < NA(C) + LNA. The lower bound is similar. 

Exercise C.2.1 This is because (C.7) is satisfied for v = 0,7 Mu, Where [L, is 
the image of v, under the map g, : R > R? given by g(x) = (B(t))re7 where 
B(u) = x and B(t) = 0 for t ~ u and where v, is the measure which satisfies (C.5) 
for X,. 


Appendix G 
Comparison with the First Edition 


This section will try to answer two questions: 


¢ If you have some knowledge of the first edition (hereafter referred to as OE), 
what can you find of interest in the present edition? 

¢ If you bought the present edition (hereafter referred to as NE), may you find 
anything of interest in OE? 


The short answer to the first question is that yes, there have been some 
dramatic improvements in some key mathematics (both in the proofs and the results 
themselves), and to the second question, it is no, unless you are a specialist of 
Banach space theory. 

Generally speaking, the entire text has been revised and polished, so at every 
place, NE should be better than OE. Greater attention has been paid to pedagogy, 
by breaking long proofs into smaller pieces which are made to stand out on their 
own. Some points, however, are a matter of taste. More variations on the theory 
of “functionals” are presented in OE, although nothing essential is omitted here. 
Another technical choice which is a matter of taste is as follows. In the basic 
constructions of partitions in metric spaces, in OE, the size of the pieces is controlled 
by the radius of these pieces, whereas in NE, it is controlled by their diameter. This 
leads to slightly simpler proofs at the expense of some worse numerical constants 
(whose value is anyway irrelevant). 

At the level of global organization, a major decision was to present the proof of 
Theorem 6.2.8 (the Latata-Bednorz theorem) at a later stage of the book, in the tenth 
chapter rather than in the fifth. The author started working on this problem as soon 
as he identified it, around 1989, and a significant part of the results of this book 
here discovered during this effort. Whereas, strictly speaking, some of these results 
are not needed to understand the proof of the Latata-Bednorz result, the underlying 
ideas are part of that proof. It might require suprahuman dedication to understand 
the proof of the Latata-Bednorz theorem as it is given in OE, but now we try to 
prepare the reader by studying random Fourier series and families of distances first. 
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Although we could not simplify the proof of the Latata-Bednorz result itself, we 
tried to prepare the reader by working backward and rewriting the proofs of the 
other “partition schemes” of the same general nature exactly in the same form as 
will be done for this theorem. Thus, if OE frustrated your attempts to understand 
this proof, you may try again, reading in order Chaps. 2, 6, 7, 9, and 10. 

A major difference between OE and NE is that a number of proofs are now quite 
shorter. At some places, like Chap. 18, this was simply achieved by reworking the 
arguments, but at many other places, the mathematics are just better. Most of the 
improvements can be traced back to a simple new idea: The method which we use 
to provide lower bounds for random Fourier (the key step of which is Lemma 7.7.3) 
series can be generalized to conditionally Bernoulli processes by appealing to 
Theorem 10.15.1 (which makes full use of the Latata-Bednorz theorem), leading 
to Lemma 11.4.1. The author then combined this idea with the idea of witnessing 
measures (as in Sect. 3.4) which replace the use of the Haar measure on groups. 
Then Witold Bednorz and Rafat Martynek [18] observed that in the case of infinitely 
divisible processes, this method could be combined with Fernique’s convexity 
argument to remove a technical condition the author was assuming. As the case of 
infinitely divisible processes had been showcased by the author precisely because 
it looked like an entry door to more general situations (such as those of empirical 
processes), this shortly lead to a positive solution of three of the main conjectures 
of OE, which are presented, respectively, in Theorems 6.8.3, 11.12.1, and 12.3.5. 

We have considerably shortened Chap. 19 for the simple reason that the field 
of Banach Spaces attracts much less attention than it used to do. We have kept 
only the topics which are very directly related to other material in the book. This 
is the one single area where the specialist may like to look at OE. We have also 
deleted results and arguments which are too tedious or too specialized compared 
to what they achieve. For example, we have deleted parts of the proof of Shor’s 
matching theorem (Theorem 17.1.3), as the method we follow there cannot yield 
an optimal result. It serves no purpose to make an exhaustive list of the deleted 
results which are mentioned at the relevant places in the present text for the sake of 
the (purely hypothetical) reader who really wants to master all details and go fetch 
them in OE. The single simple result which we have not reproduced and which is 
not too specialized is the abstract version of the Burkholder-Davis-Gundy of the 
appendix A.6 of OE (while the other material of this appendix has now found its 
way elsewhere). 
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